How to form Delta lake tables on top of Parquet files stored in ADLS GEN 2

Anand Deshpande 40 Reputation points
2025-07-11T05:54:47.9433333+00:00

Hi, Good day!

I'm looking for some guidance on how do I create Delta Lake tables on top of Parquet files stored in ADLS Gen2. The constraint here is we don't have Databricks whitelisted by our organization policies. The ultimate goal is to perform CDC using Merge statements using OTF - Open Table Format. We already have Azure as our cloud so Iceberg is bit far from consideration, so Delta Lake is currently being explored. I also want to explore how Ab initio can be interfaced with Delta Lake tables because doing CDC using Full Parquet snapshot files is just inefficient within current framework of Ab initio. Please can you assist here at the very first step of creating Delta Lake tables on Parquet files.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
{count} votes

1 answer

Sort by: Most helpful
  1. Nandamuri Pranay Teja 4,450 Reputation points Microsoft External Staff Moderator
    2025-07-11T10:23:25.63+00:00

    Hello Anand Deshpande

    Thank you for your question!

    I understand that create Delta Lake tables on top of Parquet files stored in ADLS Gen2 without the Databricks ecosystem.

    Fastly You'll need an Azure Synapse Analytics workspace set up in your Azure subscription. Provisioning a Spark pool within your Synapse workspace.

    This is just for an example if Parquet files are located at abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/parquet_data/. Go to your Synapse workspace in the Azure portal and launch Synapse Studio. In Synapse Studio, go to "Develop" -> "Notebooks" -> "New notebook" and attach the notebook to your Spark pool.

    Please be informed that for ADLS Gen2, Spark needs credentials. Synapse Spark usually handles this automatically via Managed Identity. If you encounter issues, you might explicitly set these (though less common in Synapse directly) Post which Convert Parquet to Delta Lake Table because This method adds the Delta transaction log (_delta_log directory) to your existing Parquet folder without rewriting the data. This is very efficient.

    from delta.tables import DeltaTable
    # Define the path where your Parquet files are and where the Delta table will be
    delta_table_path = "abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/your_delta_table/"
    # Ensure this path is where your Parquet files already reside,
    # or copy your Parquet files to this location first if you want them moved.
    # If you're converting an existing *folder* of Parquet files:
    DeltaTable.convertToDelta(spark, f"parquet.`{adls_path}`")
    # Now, the folder at adls_path is a Delta Lake table.
    # You can read it back as a Delta table
    delta_df = spark.read.format("delta").load(adls_path)
    delta_df.show()
    

    Please find the Find the correct Delta Lake version for your Spark version from the Delta Lake website: https://delta.io/ (look for "Maven Artifacts" or "Download").Example: org.apache.spark:spark-sql-delta_<scala_version>:<delta_lake_version> (e.g., org.apache.spark:spark-sql-delta_2.12:3.1.0).

    References:

    Hope the above answer helps! Please let us know do you have any further queries.


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members. 

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.