Picking dinamically Spark Poll to execute

Jona 885 Reputation points
2025-08-02T04:44:22.56+00:00

Hi,

I have this situation:

  1. A pipeline
  2. A notebook activity on it
  3. Three differents spark pools
  4. The pipeline is called many times with differents arguments, giving the notion of iterations

When the pipelines start executing I could realized that if they use the same spark pool, the idea of paralelism got lost since all the pipelines got queued until one using the pool finished.

So, I created two pools more to have more pipelines executing in paralell. However, I don't know how to dinamically set the actvitity executing pool.

Can you give a hand on this?

Regards

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Accepted answer
  1. Vinodh247 36,031 Reputation points MVP Volunteer Moderator
    2025-08-02T06:40:50.71+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    in synapse pipelines, you cannot dynamically choose a spark pool at runtime using available built in features in the notebook activity. The spark pool is statically defined in the notebook activity configuration. But, there are practical workarounds to achieve your goal of dynamic pool selection and improved parallelism.

    Workarounds that you can try:

    1. Use Web Activity + REST API to trigger Notebooks

    Instead of using the built-in Notebook activity, use Web activity to call the synapse REST API and start the notebook programmatically. This lets you pass the Spark pool name as a parameter.

    Steps:

    1. Use Synapse REST API:
    • Endpoint: POST https://<workspace-name>.dev.azuresynapse.net/pipelines/<pipeline-name>/createRun?api-version=2020-12-01 You can invoke a pipeline or notebook with parameters.
    • Create multiple pipelines, each tied to a different Spark pool (each hardcoded).
    • From the "controller" pipeline, use logic (If Condition + Web Activity) to route to one of those child pipelines depending on some dynamic condition (like iteration ID, load, etc.).

    This is indirect dynamic routing.

    1. Create Multiple Pipelines or Activities with Different Pools

    Define three separate notebook activities in the pipeline, each configured with a different Spark pool, and use an If Condition activity to select which one to run based on input parameters.

    1. Use Azure Function to Orchestrate

    Create an Azure Function that:

    Receives pipeline input

    Applies logic to choose the Spark pool

    Calls Synapse REST API to run notebook on selected pool

    You call the function from synapse pipeline using web activity. This is more flexible but adds azure function as a dependency.

    1. Partition Input and Run Multiple Pipelines

    If you control the orchestration:

    Split input data/workload into parts

    Start three separate pipelines, each tied to a different Spark pool

    Use parallel Execute Pipeline activities

    This ensures parallel runs and avoids pool bottlenecks.


    Limitations

    No native support for setting Spark pool name as a parameter inside a notebook activity.

    You cannot change the pool used by a notebook at runtime unless using the REST API.


    Recommendation

    If your goal is maximizing parallelism with multiple pipelines, the simplest solution is:

    Create multiple pipelines with different pools

    Use a controller pipeline to decide which to run

    OR, split notebook activities in the same pipeline with If Condition blocks

    If you need more flexibility, go with the REST API approach or Azure Function orchestration.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.