How to form Delta lake tables on top of Parquet files stored in ADLS GEN 2

Question

How to form Delta lake tables on top of Parquet files stored in ADLS GEN 2

Anand Deshpande 40

Hi, Good day!

I'm looking for some guidance on how do I create Delta Lake tables on top of Parquet files stored in ADLS Gen2. The constraint here is we don't have Databricks whitelisted by our organization policies. The ultimate goal is to perform CDC using Merge statements using OTF - Open Table Format. We already have Azure as our cloud so Iceberg is bit far from consideration, so Delta Lake is currently being explored. I also want to explore how Ab initio can be interfaced with Delta Lake tables because doing CDC using Full Parquet snapshot files is just inefficient within current framework of Ab initio. Please can you assist here at the very first step of creating Delta Lake tables on Parquet files.

Anand Deshpande 40 Reputation points

2025-07-11T11:24:57.2266667+00:00

I got email saying Nandamuri Pranay Teja has answered but I can not see the full answer.
Nandamuri Pranay Teja 4,450 Reputation points Microsoft External Staff Moderator

2025-07-14T07:04:06.73+00:00

Hello Anand Deshpande,

Please follow the link https://learn.microsoft.com/en-us/answers/questions/4372612/how-to-form-delta-lake-tables-on-top-of-parquet-fi where you can see the full answer below which was posted by me. And let me know if you have any questions via comment. I'm happy to assist you.
Nandamuri Pranay Teja 4,450 Reputation points Microsoft External Staff Moderator

2025-07-15T07:17:54.9033333+00:00

Hello Anand Deshpande,Just checking in to see if the provided answer helped. If this answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further queries do let us know.

1 answer

Your answer

Anand Deshpande 40 Reputation points

2025-07-11T11:24:57.2266667+00:00

I got email saying Nandamuri Pranay Teja has answered but I can not see the full answer.
Nandamuri Pranay Teja 4,450 Reputation points Microsoft External Staff Moderator

2025-07-14T07:04:06.73+00:00

Hello Anand Deshpande,

Please follow the link https://learn.microsoft.com/en-us/answers/questions/4372612/how-to-form-delta-lake-tables-on-top-of-parquet-fi where you can see the full answer below which was posted by me. And let me know if you have any questions via comment. I'm happy to assist you.
Nandamuri Pranay Teja 4,450 Reputation points Microsoft External Staff Moderator

2025-07-15T07:17:54.9033333+00:00

Hello Anand Deshpande,Just checking in to see if the provided answer helped. If this answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further queries do let us know.

Answer 1

Hello Anand Deshpande

Thank you for your question!

I understand that create Delta Lake tables on top of Parquet files stored in ADLS Gen2 without the Databricks ecosystem.

Fastly You'll need an Azure Synapse Analytics workspace set up in your Azure subscription. Provisioning a Spark pool within your Synapse workspace.

This is just for an example if Parquet files are located at abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/parquet_data/. Go to your Synapse workspace in the Azure portal and launch Synapse Studio. In Synapse Studio, go to "Develop" -> "Notebooks" -> "New notebook" and attach the notebook to your Spark pool.

Please be informed that for ADLS Gen2, Spark needs credentials. Synapse Spark usually handles this automatically via Managed Identity. If you encounter issues, you might explicitly set these (though less common in Synapse directly) Post which Convert Parquet to Delta Lake Table because This method adds the Delta transaction log (_delta_log directory) to your existing Parquet folder without rewriting the data. This is very efficient.

from delta.tables import DeltaTable
# Define the path where your Parquet files are and where the Delta table will be
delta_table_path = "abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/your_delta_table/"
# Ensure this path is where your Parquet files already reside,
# or copy your Parquet files to this location first if you want them moved.
# If you're converting an existing *folder* of Parquet files:
DeltaTable.convertToDelta(spark, f"parquet.`{adls_path}`")
# Now, the folder at adls_path is a Delta Lake table.
# You can read it back as a Delta table
delta_df = spark.read.format("delta").load(adls_path)
delta_df.show()

Please find the Find the correct Delta Lake version for your Spark version from the Delta Lake website: https://delta.io/ (look for "Maven Artifacts" or "Download").Example: org.apache.spark:spark-sql-delta_<scala_version>:<delta_lake_version> (e.g., org.apache.spark:spark-sql-delta_2.12:3.1.0).

References:

Hope the above answer helps! Please let us know do you have any further queries.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

How to form Delta lake tables on top of Parquet files stored in ADLS GEN 2

1 answer

Your answer