How to access blob storage from a notebook running on a serverless spark compute in a ML workspace

Balashov, Alexander 0 Reputation points
2025-07-14T08:52:05.5333333+00:00

I am trying to run a simple Jupiter notebook on the serveless spark compute and read data from a blob storage that has only private access (connected to VNen by using private endpoints)

I tried several solutions based on documentation but nothing worked for me. The same script running on a VM compute created in ML workspace manually works pretty good.

I checked how FQDNs of blob storage are resolved. As expected they were resolved to public IPs in case of serverless and to private in case of VM.

It seems to be a networking issue.

Could you please help me with this problem?

Azure Machine Learning
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 35,766 Reputation points Volunteer Moderator
    2025-08-04T17:24:37.6666667+00:00

    Hello Alexander !

    Thank you for posting on Microsoft Learn.

    Serverless Spark compute in AML cannot access Blob Storage accounts that are locked behind private endpoints or VNet integration, because serverless Spark runs outside your private VNet and doesn't support managed private connectivity.

    You can use a VM or cluster with VNet injection can access your private Blob Storage.

    You can create a Compute Cluster or Compute Instance and assign it to the same VNet/subnet where your Blob is accessible and then use that for your notebooks instead of serverless Spark if you need private access.

    If you must use serverless Spark and the data is non-sensitive or you're okay with temporary exposure, you need to generate a SAS URL with read permissions and then access the blob in Spark like this:

    spark.read.format("parquet").load("https://<your-storage-account>.blob.core.windows.net/<container>/<path>?<SAS-token>")
    

    If you control the firewall rules of your Blob Storage, you can enable public access temporarily and allow only Azure Datacenter IP ranges for the region your workspace is in.

    Keep in mind that it is not recommended for production environments.

    If you're using ADLS Gen2 and your storage supports OAuth access and hierarchical namespace, you can use Azure Active Directory passthrough with identity-based access.

    But this only works if your public endpoint is accessible from the Spark environment.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.