Hello MrFlinstone!
Thanks for sharing the details and posting your query on Microsoft QnA!.
I understand you're trying to read a CSV file from your storage account using PySpark in Synapse, and while your SQL OPENROWSET query works, the PySpark code is failing due to access issues.
Here’s a simplified explanation of what’s happening and how you can fix it:
When you run the SQL query, it uses your user account, which has access to the storage account—so it works.
But when you use PySpark in Synapse, it doesn’t use your account. Instead, it uses the managed identity of the Synapse workspace or Spark pool.
Could you check if That identity currently has permission to access the storage account? If not, then please follow the steps given below.
- Check the Managed Identity
- Go to your Synapse workspace in the Azure portal.
- Under Identity, make sure System Assigned is turned on.
- Copy the Object ID of that identity. Learn how: https://docs.azure.cn/en-us/data-factory/credentials?tabs=data-factory
- Give It Access
- Go to your storage account → Access Control (IAM).
- Add a role assignment: choose Storage Blob Data Reader or Storage Blob Data Contributor. Learn how: https://learn.microsoft.com/en-us/azure/storage/blobs/assign-azure-role-data-access?tabs=portal
- Assign it to the managed identity you just found.
- Network Settings
- If your storage account has firewall rules or is behind a VNet, make sure Synapse can reach it. You might need to allow Trusted Microsoft Services or set up a Private Endpoint. Refer this learn doc: "https://learn.microsoft.com/en-us/azure/synapse-analytics/security/connect-to-a-secure-storage-account"
Finally, once you’ve given the right permissions and fixed the code:
Try running the notebook again.
- If it still fails, check the Synapse job logs for any access errors and share the error snapshot or logs with us.
Let me know how it goes or if you'd like help walking through any of these steps!
Please "Accept the Answer" if the response is helpful.
Thanks
Pratyush