Hi Jona Thanks for sharing the screenshot, it clearly shows your Spark job output with multiple parquet files and the _SUCCESS
file. Since your job writes many files, triggering a pipeline on every file creation would definitely cause a lot of unnecessary pipeline runs.
The best and most common solution here is to trigger your Synapse pipeline only on the creation of the _SUCCESS
file. This file is a built-in Spark indicator that the whole batch finished writing successfully, so triggering on it means your pipeline starts only when the full dataset is ready.
How you can set this up:
- Use an Event Grid trigger in Synapse that listens to blob creation events in your ADLS Gen2 container.
- Add a filter on the event so it only triggers when the filename is exactly
_SUCCESS
(you can use suffix or exact filename filtering). - Configure your pipeline to pick up the folder path dynamically, so it processes the data corresponding to that batch.
This approach avoids triggering your pipeline multiple times per batch, simplifies orchestration, and aligns with best practices for Spark output.
Alternatives:
- If you want, you can also write a small “control” file from your Spark job at the end of the batch instead of relying on
_SUCCESS
. - Or use scheduled triggers if your data arrives on a fixed schedule.
But honestly, triggering on _SUCCESS
is usually the simplest and most reliable method.
Hope this helps. If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.