IR Upgrade to Spark 3.4 - data flow performance issues

Question

IR Upgrade to Spark 3.4 - data flow performance issues

Melissa Lord 0

Hello,

We have a data flow that retrieves data from an API endpoint. The pipeline has not been changed in months, however on 7/9/2025, our pipeline time tripled in run time. We have tried to create a new Azure IR and increased the TTL option as well as the core count, with very limited success.

I was digging into the individual data flows and noticed that the exact day the performance issues started, Spark was upgraded from 3.3 to 3.4. Has anyone else had this issue? Financial reporting depends on this pipeline, so this is causing havoc (screenshots attached with times/version).

Thank you.

User's image

Melissa Lord 0 Reputation points

2025-07-18T15:45:52.7066667+00:00

We can provide more detail on the data flows if needed, but this seems to be the only change that has occurred. Thank you.
Smaran Thoomu 28,310 Reputation points Microsoft External Staff Moderator

2025-07-23T19:42:59.8233333+00:00

Melissa Lord Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Allen 0 Reputation points

2025-08-07T14:17:08.13+00:00

Melissa Lord I'm experiencing the same issue on our dataflows since spark 3.4 update. Were you able to find any solution?
Melissa Lord 0 Reputation points

2025-08-07T14:35:08.53+00:00

Hi Allen - unfortunately not yet. We have a ticket open with Microsoft, but have not found any type of solution. We are at the point of trying something other than a data flow...but still working through Microsoft's suggestions.
Allen 0 Reputation points

2025-08-07T14:45:13.3433333+00:00

Same here, our spark version changed last week and our dataflows have doubled in runtime. I also have a ticket open but nothing so far. One thing they noted. for whatever reason reading metadata from source (in my case SAP) has increased dramatically. They suggested importing projection but our dataflows are parameterized so I can't really do that in this case.

1 answer

Your answer

Melissa Lord 0 Reputation points

2025-07-18T15:45:52.7066667+00:00

We can provide more detail on the data flows if needed, but this seems to be the only change that has occurred. Thank you.
Smaran Thoomu 28,310 Reputation points Microsoft External Staff Moderator

2025-07-23T19:42:59.8233333+00:00

Melissa Lord Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Allen 0 Reputation points

2025-08-07T14:17:08.13+00:00

Melissa Lord I'm experiencing the same issue on our dataflows since spark 3.4 update. Were you able to find any solution?
Melissa Lord 0 Reputation points

2025-08-07T14:35:08.53+00:00

Hi Allen - unfortunately not yet. We have a ticket open with Microsoft, but have not found any type of solution. We are at the point of trying something other than a data flow...but still working through Microsoft's suggestions.
Allen 0 Reputation points

2025-08-07T14:45:13.3433333+00:00

Same here, our spark version changed last week and our dataflows have doubled in runtime. I also have a ticket open but nothing so far. One thing they noted. for whatever reason reading metadata from source (in my case SAP) has increased dramatically. They suggested importing projection but our dataflows are parameterized so I can't really do that in this case.

Answer 1

Venkat Reddy Navari 5,330 Microsoft External Staff Moderator

Hi Melissa Lord Thanks for providing the details and screenshots. Based on your observations, the upgrade from Spark 3.3 to 3.4 within the Azure Integration Runtime (IR) appears to align with the significant degradation in your data flow performance.

Here are some technical points and troubleshooting steps you might consider:

Spark 3.4 Behavioral Changes: Spark 3.4 introduced various optimizations and also some behavioral changes in Catalyst optimizer, shuffle mechanisms, and API integrations which may impact certain workloads. Specifically:

Changes in query optimization and join strategies might cause different physical plans resulting in more expensive shuffles or scans.
Updates to adaptive query execution (AQE) parameters might behave differently by default in 3.4, affecting how stages are optimized at runtime.

Execution Plan Comparison: Extract and compare the Spark SQL query plans (explain() output) for your data flow transformations before and after the upgrade.

Check for increased shuffle operations, skew, or changed join types (e.g., broadcast join replaced with sort-merge join).

Use Spark UI or diagnostic logs if available on Azure Data Factory to analyze task time and stage breakdown.

Spark Configuration Differences: Verify if the Spark configurations (like spark.sql.shuffle.partitions, spark.executor.memory, spark.sql.adaptive.enabled) have changed implicitly with the new runtime.

You can explicitly set these configurations via your Azure Data Flow debug session or pipeline parameters to experiment if performance improves.

Azure IR Version Pinning and Rollback: Confirm if Azure IR allows pinning to Spark 3.3 or if you can deploy a new IR instance with the older Spark runtime for comparison.

This would help isolate whether the Spark upgrade is the root cause versus other environmental changes.

API Connector Impact: Since your data flow reads from an API endpoint, check if any underlying REST connector or HTTP client libraries were updated in the new runtime which might impact request/response times or retries.

Evaluate if there are increased throttling or latency issues from the API source on the new runtime.

Profiling and Metrics:

Review detailed profile logs from your data flows for metrics such as computeAcquisitionDuration, idleTimeBeforeCurrentJob, and taskDuration.
Look for increased idle times or resource contention which might suggest scheduler or resource allocation changes in Spark 3.4.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Smaran Thoomu 28,310 Reputation points Microsoft External Staff Moderator

2025-07-22T19:13:52.1166667+00:00
Melissa Lord Based on the logs, it looks like your pipeline was running on Spark 3.3 as of July 8, with pretty consistent durations (~30s per run). Since you mentioned the slowdown started right after this, the Spark upgrade to 3.4 definitely seems like a strong candidate here.

Here’s what you can try:

Next steps to help isolate the issue:

Run the same data flow in debug mode now (with Spark 3.4) and look at the execution stages.

Pay close attention to any new joins, shuffles, or unexpected delays in source reads or sink writes.

Check the setting for adaptive query execution (AQE):

In Spark 3.4, this can behave differently. You can try setting:
spark.sql.adaptive.enabled = false
via debug config in your data flow to test if that changes performance.

If you're calling an API source, test response time separately with a tool like Postman or a notebook. We’ve seen cases where API throttling or retry delays increased in Spark 3.4 due to connector behavior changes.

I hope this information helps. Please do let us know if you have any further queries.
Melissa Lord 0 Reputation points

2025-08-07T14:36:22.32+00:00

Postman behaves normally, no additional time, so we believe it to be the Spark instance. We are going to try the adaptive query execution next. Thank you.
Melissa Lord 0 Reputation points

2025-08-07T15:15:57.1633333+00:00

Hello @Smaran Thoomu can you please explain how to change the spark.sql.adaptive.enabled setting within Azure Data Factory? Since this is a Spark instance that spins up only when Data Flows are used, I cannot connect to it separately, so I don't believe creating a new Linked Service and configuring that is valid. Is this a User Property on the data flow itself? Thank you.
Venkat Reddy Navari 5,330 Reputation points Microsoft External Staff Moderator

2025-08-12T11:19:16.11+00:00
Melissa Lord Since the Spark cluster used by Azure Data Factory data flows is managed and spins up only when needed, you cannot directly connect to it or configure it via a Linked Service. Instead, the correct way to change Spark settings like spark.sql.adaptive.enabled is by setting them as User Properties on the Data Flow itself.

Here’s how you can do that:

Open your Data Flow in Azure Data Factory.

In the Data Flow canvas, click on the blank area (or outside any specific activity) to bring up the Data Flow’s main Properties pane.

Scroll down to find the User Properties section.

Add a new User Property with:

Name: spark.sql.adaptive.enabled

Value: false

Make sure your pipeline passes this property during execution by verifying the Data Flow activity’s settings in your pipeline.

Publish your changes and run the pipeline to test if performance improves with Adaptive Query Execution disabled.

This approach applies the Spark configuration dynamically at runtime for each ephemeral Spark cluster the data flow uses, no need for creating or modifying Linked Services.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

IR Upgrade to Spark 3.4 - data flow performance issues

1 answer

Your answer