About trigger Synapse Pipeline on folder creation

Question

About trigger Synapse Pipeline on folder creation

Jona 885

Hi,

I have a Spark App that writes data into ADLS2, producing the following files structure:

User's image

Since I have many files written, I can't just create a Blob created trigger on a pipeline since there are many of them.

I was thinking about a trigger on the _SUCCESS file but it doesn't seem to be a very pretty solution

Regards

1 answer

Your answer

Answer 1

Venkat Reddy Navari 5,255 Microsoft External Staff Moderator

Hi Jona Thanks for sharing the screenshot, it clearly shows your Spark job output with multiple parquet files and the _SUCCESS file. Since your job writes many files, triggering a pipeline on every file creation would definitely cause a lot of unnecessary pipeline runs.

The best and most common solution here is to trigger your Synapse pipeline only on the creation of the _SUCCESS file. This file is a built-in Spark indicator that the whole batch finished writing successfully, so triggering on it means your pipeline starts only when the full dataset is ready.

How you can set this up:

Use an Event Grid trigger in Synapse that listens to blob creation events in your ADLS Gen2 container.
Add a filter on the event so it only triggers when the filename is exactly _SUCCESS (you can use suffix or exact filename filtering).
Configure your pipeline to pick up the folder path dynamically, so it processes the data corresponding to that batch.

This approach avoids triggering your pipeline multiple times per batch, simplifies orchestration, and aligns with best practices for Spark output.

Alternatives:

If you want, you can also write a small “control” file from your Spark job at the end of the batch instead of relying on _SUCCESS.
Or use scheduled triggers if your data arrives on a fixed schedule.

But honestly, triggering on _SUCCESS is usually the simplest and most reliable method.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Jona 885 Reputation points

2025-07-30T01:28:39.4833333+00:00

Thanks very much for the clarification.

Regards
Jona 885 Reputation points

2025-08-05T04:17:57.3566667+00:00

Sorry for unaccepting the answer ... But something wrong is happening with my trigger. This is the config.

Since _SUCCESS has 0 bytes length, I had to enable the trigger option called "Ignore empty blobs" to "No"

Then, I realized that with this setup, the trigger run for others files also:

Can you give a hand?

Regards
Venkat Reddy Navari 5,255 Reputation points Microsoft External Staff Moderator

2025-08-05T16:49:12.6633333+00:00
HI Jona

You're correct that the _SUCCESS file needs "Ignore empty blobs" set to No, since it’s 0 bytes. However, the issue you're seeing, where the trigger also fires for other files, is likely because of a broader filter setting.

Refine the "Blob path begins with" filter

Right now, you have it set to:

blob/

This is quite general and may be matching more than just the _SUCCESS file path.

If your _SUCCESS file lives at a path like:

cdz/blob/common_enumwstat/.spark-staging-<some-uuid>/_SUCCESS

Then update the trigger to:

Blob path begins with: blob/common_enumwstat/

This helps narrow the scope of the trigger to just the folder where _SUCCESS is expected to be.

Leave "Blob path ends with" as:

_SUCCESS

That’s perfect, it ensures the trigger only runs for files named _SUCCESS.

Ignore empty blobs:

You did the right thing by setting it to No, otherwise the trigger would miss the _SUCCESS file.

Finally: By narrowing the folder filter, you avoid the trigger firing for every blob created under the broader container path. This way, your pipeline will only run once per completed Spark job exactly what you're aiming for.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Jona 885 Reputation points

2025-08-07T00:52:47.16+00:00
I can't narrow the selector like this:

blob/common_enumwstat/

Because common_enumwstat is just one table among other dozens I have for wich the trigger I need it to be fired..

I believed that _SUCCESS and Allowing non empty blobs woulb compound a conjuction. Not sure why the file I post is passing since its name is not _SUCCESS.

If no solution is there on the way, I will have to implement a control file logic

Regards
Venkat Reddy Navari 5,255 Reputation points Microsoft External Staff Moderator

2025-08-07T13:05:30.6066667+00:00
Jona That said, the issue you're encountering happens because Event Grid behaves differently when the "Blob path begins with" filter is set too broadly like blob/ in your case. Even though you've specified _SUCCESS in the "ends with" field, Event Grid sometimes still triggers on other files, especially when nested folder paths or parallel file writes are involved. In some cases, the filter doesn’t always apply strictly in these scenarios.

Since you have multiple tables and can't narrow the path for each, the cleanest workaround is to add a simple validation step inside the pipeline to ignore anything that's not _SUCCESS.

Here’s the suggest:

Add an “If Condition” activity at the start of your pipeline

Use this expression:

@equals(last(split(triggerBody()?['data']?['url'], '/')), '_SUCCESS')

This checks if the file that triggered the pipeline is actually _SUCCESS.

If the condition is false:

Connect the false path to a “Terminate” activity, and set the status to Cancelled.

This way, even if the trigger fires for an unexpected file, the pipeline will exit cleanly before doing anything.

If this approach still doesn't provide the control you need, another option is to modify your Spark job to write a separate control file (like done.txt) once the entire batch has completed. But from what you've described, the in-pipeline check should be enough to handle the edge cases without changing your data writing logic.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

About trigger Synapse Pipeline on folder creation

1 answer

Your answer