How to Define a Parquet-LIST as a SQL Field in a External Table

Harry Leboeuf 26 Reputation points
2025-08-06T19:58:10.4933333+00:00

Hello,

First in a Synapse Notebook I'm reading PowerBI Refreshes, this is written into a Parquet-file on a DataLake Gen 2

refreshes = requests.get(admin_refreshables_expand, headers=header)

print(refreshes)
try:
    refreshes = json.loads(refreshes.content)
except Exception as e:
    print(e)

try:
    result = pd.concat([pd.json_normalize(x) for x in refreshes['value']])
    print('Found', len(result['id'].dropna().unique().tolist()), 'refreshes.')
except Exception:
    print('No refreshes found.')

WritePandasDataFrame(accountName, azu_clientId, azu_secretId, containerName, dirPath + "pbi_refreshes", "refresh_" + fileDate.strftime("%Y%m%d"), result)


When I look into that file with a parquet file viewer in Visual Code i can see first some string fields

User's image

But that list ["string"] is cuasing me serieus troubles. In the doc found at https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-type-mapping?view=sql-server-ver17

it states that we should define it a a VARCHAR(8000) on other pages its a VARCHAR(MAX). But whateven i create it results in a error

This is the external table

CREATE EXTERNAL TABLE [stg_60_160].[SynMon_PBI_Refresh_Parquet]
(	[id] [nvarchar](256) NULL,
	[name] [nvarchar](256) NULL,
	[kind] [nvarchar](256) NULL,
	[configuredBy] [varchar](MAX) NULL,
 ....	)
		WITH 
		(
			LOCATION = 'ISCC/60_166/pbi_refreshes',
			DATA_SOURCE = EnterpriseBiDataMesh,
			FILE_FORMAT = enterprise_bi_storage_out_file_format_parquet_gzip 		);


The error when doing a select * from this table is

Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: optional group configuredBy (LIST) {
  repeated group list {
    optional binary element (UTF8);
  }
} is not primitive

Anybody has an idea on this

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Accepted answer
  1. Vinodh247 36,031 Reputation points MVP Volunteer Moderator
    2025-08-07T00:59:14.5233333+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    The issue you are facing stems from the fact that Parquet nested types, specifically the LIST logical type are not supported by sql server polybase/synapse ext tables directly. You are trying to read a parquet list into a VARCHAR or NVARCHAR column, but PolyBase expects a primitive (flat) column and cannot handle nested/repeated structures like LIST, MAP, or STRUCT.

    Note that...

    1. You cannot directly define a parquet LIST as a VARCHAR, NVARCHAR, or ARRAY in external table definitions PolyBase does not support nested structures in Parquet.
    2. You cannot define custom deserializers or interpreters in external tables.

    Option 1: Flatten the list to a primitive (string) before writing Parquet

    This is the recommended approach.

    Modify your synapse notebook code to flatten any list columns (e.g., configuredBy) to a delimited string (ex: csv) before writing to parquet. Then write your result DataFrame using WritePandasDataFrame(...).

    Option 2: Use COPY INTO or Synapse Pipelines instead of External Tables

    If you cannot flatten the Parquet file, and must deal with complex types, PolyBase and External Tables are not the right tools. Use COPY INTO to ingest the data, or use a Synapse Pipeline or Spark notebook to read the Parquet and write it into a flat SQL table.

    As a best practice, pls consider:

    1. If you are using Fabric, prefer using Lakehouse tables and notebooks where nested types are supported natively.
    2. If you are on synapse, use Spark for complex structures and only use SQL server ext tables for flat files.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.