Apache Spark poll ends with time out error

Jona 885 Reputation points
2025-08-07T15:41:45.63+00:00

Hi,

When runninn my pipeline which contains a Notebook activity, I faced this error. This error appears suddenly sometimes, and sometimes doesn't happend

This pipeline executes iteratively almost 15 times, for the same Notebook which is parametrized. The Spark pool size is Small (4 vCores/28GB Ram). The PySpark code just copy a small file from one size (5MB) to another

PySpark code

User's image

The error.

User's image

Can you give a hand?

Regards

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 28,225 Reputation points Microsoft External Staff Moderator
    2025-08-07T18:28:11.4766667+00:00

    Hi Jona

    Thank you for posting your query.

    From the screenshots, the error you're encountering is due to a timeout in the mssparkutils.notebook.run() function, which is calling a notebook (dbo_shiftload) within another notebook. This typically happens when:

    1. The called notebook takes longer to run than the default timeout allows.
    2. There’s a bottleneck in Spark pool resources or job queuing delays, especially when running the same notebook iteratively (15 times, as mentioned).
    3. The target notebook cell contains operations that exceed execution limits (e.g., 90 seconds in your error message).

    Recommended Actions:

    • Increase the Timeout Parameter: When calling the notebook, explicitly pass a higher timeout in seconds:
        sourceFolderPath = mssparkutils.notebook.run("blob/{scheme}/{table}", 300)  # 300 seconds timeout
        
      
      You can adjust 300 as per your workload.
    • Avoid Overlapping Invocations: Since the same notebook is invoked iteratively, ensure the invocations don’t overlap and exhaust the Spark pool. If you're using a ForEach loop or similar logic, consider adding a slight delay (time.sleep()) between invocations or using a larger Spark pool temporarily to reduce pressure.
    • Check Job Logs for the Target Notebook: Go to Synapse Studio > Monitor > Apache Spark Applications. Look up the failed job for dbo_shiftload, and check logs for:
      • Any long-running Spark actions or shuffle issues.
      • Driver or executor memory issues.
      • Initialization delays.
    • Optimize the Target Notebook: Since the notebook just copies a 5MB file, make sure it doesn’t have any unnecessary transformations or wide transformations that could trigger Spark job stage retries or long shuffles.
    • Review Documentation: As noted in the error, Microsoft provides additional details here:

    Let me know if you want help reviewing the notebook content or retry logic across the pipeline. I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.