Hi ,
Thanks for reaching out to Microsoft Q&A.
The issue you are facing stems from the fact that Parquet nested types, specifically the LIST
logical type are not supported by sql server polybase/synapse ext tables directly. You are trying to read a parquet list into a VARCHAR
or NVARCHAR
column, but PolyBase expects a primitive (flat) column and cannot handle nested/repeated structures like LIST
, MAP
, or STRUCT
.
Note that...
- You cannot directly define a parquet LIST as a
VARCHAR
,NVARCHAR
, orARRAY
in external table definitions PolyBase does not support nested structures in Parquet. - You cannot define custom deserializers or interpreters in external tables.
Option 1: Flatten the list to a primitive (string) before writing Parquet
This is the recommended approach.
Modify your synapse notebook code to flatten any list columns (e.g., configuredBy
) to a delimited string (ex: csv) before writing to parquet. Then write your result
DataFrame using WritePandasDataFrame(...)
.
Option 2: Use COPY INTO or Synapse Pipelines instead of External Tables
If you cannot flatten the Parquet file, and must deal with complex types, PolyBase and External Tables are not the right tools. Use COPY INTO
to ingest the data, or use a Synapse Pipeline or Spark notebook to read the Parquet and write it into a flat SQL table.
As a best practice, pls consider:
- If you are using Fabric, prefer using Lakehouse tables and notebooks where nested types are supported natively.
- If you are on synapse, use Spark for complex structures and only use SQL server ext tables for flat files.
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.