Hello Jona,
Thank you for your question about the data insertion issue in your Azure Synapse Notebook with Delta tables. It sounds like you've set up a simple table creation script using Spark SQL, inserted some rows, but when you query it afterward, nothing shows up. This is a common gotcha with Delta tables in Synapse, often related to how the table is managed, session scope, or the way inserts are committed.
Let's break it down and get this fixed step by step. Based on what you've described, your table is being created as a managed table (without a explicit LOCATION clause), which means its data is stored in the Synapse warehouse path. However, inserts via %%sql might not always persist or become immediately visible if there's a session mismatch or if the table isn't properly registered in the metastore.
Quick Verification
First, confirm if the data is actually being written to storage:
-
- Switch to a PySpark cell (remove %%sql) and run this to check the underlying Delta files: ```
delta_path = "abfss://<your-filesystem>@<your-storage-account>.dfs.core.windows.net/warehouse/myTable" # Adjust based on your warehouse path
df = spark.read.format("delta").load(delta_path)
display(df)
``
If this shows your data, the issue is with how the table is registered or queried in SQL. If it returns nothing, the inserts aren't landing.
Likely Causes and Fixes
- Managed Table Persistence: In Synapse, managed Delta tables store data in the default warehouse location (e.g., /warehouse/myTable). But if your notebook session ends or restarts, the metastore might not sync properly. Try specifying an explicit LOCATION to make it an external table, which is more reliable for notebooks:
This points to a specific ADLS Gen2 path, ensuring data persists outside the session.%%sql CREATE TABLE myTable ( id INT, name STRING, lastName STRING ) USING DELTA LOCATION 'abfss://<your-filesystem>@<your-storage-account>.dfs.core.windows.net/delta/myTable/';
- Insert Data Using PySpark for Reliability: Spark SQL inserts can sometimes fail silently if there's a transaction issue. Instead, create a DataFrame and write it as Delta:
- In a PySpark cell:
from pyspark.sql import Row data = [Row(id=1, name="John", lastName="Doe"), Row(id=2, name="Jane", lastName="Smith")] # Add your sample data here df = spark.createDataFrame(data) df.write.format("delta").mode("append").saveAsTable("myTable") # Or save to the LOCATION path if external
- Then query it back with %%sql:
This should now return your rows.%%sql SELECT * FROM myTable;
- In a PySpark cell:
- Commit and Refresh: After inserts, explicitly refresh the table to ensure the metastore is updated:
%%sql REFRESH TABLE myTable; SELECT * FROM myTable;
- Common Pitfalls to Check:
- Session Scope: If you're running cells in different sessions or kernels, the table might not be visible. Restart the notebook and run everything in sequence.
- Permissions: Ensure your Synapse workspace has write access to the storage account/path. Check for any ACL errors in the notebook logs.
- Delta Version: Synapse uses a specific Delta runtime— if you're on an older Spark pool, upgrade to the latest (e.g., Spark 3.3+ with Delta 2.0+) for better compatibility.
- No Results Even with Files: Sometimes, Delta logs (_delta_log folder) exist, but queries return empty if the table isn't registered. Use
DeltaTable.forPath(spark, delta_path)
in PySpark to load and verify.
If you run the above and still see no data, share more details like the exact insert statements, any error messages in the output, or your Spark pool version—I can refine the advice further. This should get your Delta table working smoothly for querying!
Best regards,
Jerald Felix