Hello Anand Deshpande
Thank you for your question!
I understand that create Delta Lake tables on top of Parquet files stored in ADLS Gen2 without the Databricks ecosystem.
Fastly You'll need an Azure Synapse Analytics workspace set up in your Azure subscription. Provisioning a Spark pool within your Synapse workspace.
This is just for an example if Parquet files are located at abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/parquet_data/
. Go to your Synapse workspace in the Azure portal and launch Synapse Studio. In Synapse Studio, go to "Develop" -> "Notebooks" -> "New notebook" and attach the notebook to your Spark pool.
Please be informed that for ADLS Gen2, Spark needs credentials. Synapse Spark usually handles this automatically via Managed Identity. If you encounter issues, you might explicitly set these (though less common in Synapse directly) Post which Convert Parquet to Delta Lake Table because This method adds the Delta transaction log (_delta_log
directory) to your existing Parquet folder without rewriting the data. This is very efficient.
from delta.tables import DeltaTable
# Define the path where your Parquet files are and where the Delta table will be
delta_table_path = "abfss://******@yourstorageaccount.dfs.core.windows.net/path/to/your_delta_table/"
# Ensure this path is where your Parquet files already reside,
# or copy your Parquet files to this location first if you want them moved.
# If you're converting an existing *folder* of Parquet files:
DeltaTable.convertToDelta(spark, f"parquet.`{adls_path}`")
# Now, the folder at adls_path is a Delta Lake table.
# You can read it back as a Delta table
delta_df = spark.read.format("delta").load(adls_path)
delta_df.show()
Please find the Find the correct Delta Lake version for your Spark version from the Delta Lake website: https://delta.io/ (look for "Maven Artifacts" or "Download").Example: org.apache.spark:spark-sql-delta_<scala_version>:<delta_lake_version>
(e.g., org.apache.spark:spark-sql-delta_2.12:3.1.0
).
References:
- https://microsoftlearning.github.io/mslearn-synapse/Instructions/Labs/05-Use-delta-lake.html
- https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-table-adls-gen2
- https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/low-shuffle-merge-for-apache-spark
Hope the above answer helps! Please let us know do you have any further queries.
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.