About trigger Synapse Pipeline on folder creation

Jona 885 Reputation points
2025-07-29T17:04:31.8766667+00:00

Hi,

I have a Spark App that writes data into ADLS2, producing the following files structure:

User's image

Since I have many files written, I can't just create a Blob created trigger on a pipeline since there are many of them.

I was thinking about a trigger on the _SUCCESS file but it doesn't seem to be a very pretty solution

Regards

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Venkat Reddy Navari 5,255 Reputation points Microsoft External Staff Moderator
    2025-07-29T17:58:56.4566667+00:00

    Hi Jona Thanks for sharing the screenshot, it clearly shows your Spark job output with multiple parquet files and the _SUCCESS file. Since your job writes many files, triggering a pipeline on every file creation would definitely cause a lot of unnecessary pipeline runs.

    The best and most common solution here is to trigger your Synapse pipeline only on the creation of the _SUCCESS file. This file is a built-in Spark indicator that the whole batch finished writing successfully, so triggering on it means your pipeline starts only when the full dataset is ready.

    How you can set this up:

    1. Use an Event Grid trigger in Synapse that listens to blob creation events in your ADLS Gen2 container.
    2. Add a filter on the event so it only triggers when the filename is exactly _SUCCESS (you can use suffix or exact filename filtering).
    3. Configure your pipeline to pick up the folder path dynamically, so it processes the data corresponding to that batch.

    This approach avoids triggering your pipeline multiple times per batch, simplifies orchestration, and aligns with best practices for Spark output.

    Alternatives:

    • If you want, you can also write a small “control” file from your Spark job at the end of the batch instead of relying on _SUCCESS.
    • Or use scheduled triggers if your data arrives on a fixed schedule.

    But honestly, triggering on _SUCCESS is usually the simplest and most reliable method.


    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.