How to Define a Parquet-LIST as a SQL Field in a External Table

Question

How to Define a Parquet-LIST as a SQL Field in a External Table

Harry Leboeuf 26

Hello,

First in a Synapse Notebook I'm reading PowerBI Refreshes, this is written into a Parquet-file on a DataLake Gen 2

refreshes = requests.get(admin_refreshables_expand, headers=header)

print(refreshes)
try:
    refreshes = json.loads(refreshes.content)
except Exception as e:
    print(e)

try:
    result = pd.concat([pd.json_normalize(x) for x in refreshes['value']])
    print('Found', len(result['id'].dropna().unique().tolist()), 'refreshes.')
except Exception:
    print('No refreshes found.')

WritePandasDataFrame(accountName, azu_clientId, azu_secretId, containerName, dirPath + "pbi_refreshes", "refresh_" + fileDate.strftime("%Y%m%d"), result)

When I look into that file with a parquet file viewer in Visual Code i can see first some string fields

User's image

But that list ["string"] is cuasing me serieus troubles. In the doc found at https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-type-mapping?view=sql-server-ver17

it states that we should define it a a VARCHAR(8000) on other pages its a VARCHAR(MAX). But whateven i create it results in a error

This is the external table

CREATE EXTERNAL TABLE [stg_60_160].[SynMon_PBI_Refresh_Parquet]
(	[id] [nvarchar](256) NULL,
	[name] [nvarchar](256) NULL,
	[kind] [nvarchar](256) NULL,
	[configuredBy] [varchar](MAX) NULL,
 ....	)
		WITH 
		(
			LOCATION = 'ISCC/60_166/pbi_refreshes',
			DATA_SOURCE = EnterpriseBiDataMesh,
			FILE_FORMAT = enterprise_bi_storage_out_file_format_parquet_gzip 		);

The error when doing a select * from this table is

Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: optional group configuredBy (LIST) {
  repeated group list {
    optional binary element (UTF8);
  }
} is not primitive

Anybody has an idea on this

Kalyani Kondavaradala 405 Reputation points Microsoft External Staff Moderator

2025-08-08T06:29:52.0366667+00:00

Hi @Harry Leboeuf,

Greetings for the day!

Just checking in to see if the below answer provided by @Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query please let us know.

Thanks,

Kalyani

Accepted answer

0 additional answers

Your answer

Kalyani Kondavaradala 405 Reputation points Microsoft External Staff Moderator

2025-08-08T06:29:52.0366667+00:00

Hi @Harry Leboeuf,

Greetings for the day!

Just checking in to see if the below answer provided by @Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query please let us know.

Thanks,

Kalyani

Answer 1

Hi ,

Thanks for reaching out to Microsoft Q&A.

The issue you are facing stems from the fact that Parquet nested types, specifically the LIST logical type are not supported by sql server polybase/synapse ext tables directly. You are trying to read a parquet list into a VARCHAR or NVARCHAR column, but PolyBase expects a primitive (flat) column and cannot handle nested/repeated structures like LIST, MAP, or STRUCT.

Note that...

You cannot directly define a parquet LIST as a VARCHAR, NVARCHAR, or ARRAY in external table definitions PolyBase does not support nested structures in Parquet.
You cannot define custom deserializers or interpreters in external tables.

Option 1: Flatten the list to a primitive (string) before writing Parquet

This is the recommended approach.

Modify your synapse notebook code to flatten any list columns (e.g., configuredBy) to a delimited string (ex: csv) before writing to parquet. Then write your result DataFrame using WritePandasDataFrame(...).

Option 2: Use COPY INTO or Synapse Pipelines instead of External Tables

If you cannot flatten the Parquet file, and must deal with complex types, PolyBase and External Tables are not the right tools. Use COPY INTO to ingest the data, or use a Synapse Pipeline or Spark notebook to read the Parquet and write it into a flat SQL table.

As a best practice, pls consider:

If you are using Fabric, prefer using Lakehouse tables and notebooks where nested types are supported natively.
If you are on synapse, use Spark for complex structures and only use SQL server ext tables for flat files.

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Share via

How to Define a Parquet-LIST as a SQL Field in a External Table

0 additional answers

Your answer