IR Upgrade to Spark 3.4 - data flow performance issues

Melissa Lord 0 Reputation points
2025-07-18T15:44:49.93+00:00

Hello,

We have a data flow that retrieves data from an API endpoint. The pipeline has not been changed in months, however on 7/9/2025, our pipeline time tripled in run time. We have tried to create a new Azure IR and increased the TTL option as well as the core count, with very limited success.

I was digging into the individual data flows and noticed that the exact day the performance issues started, Spark was upgraded from 3.3 to 3.4. Has anyone else had this issue? Financial reporting depends on this pipeline, so this is causing havoc (screenshots attached with times/version).

Thank you.

User's imageUser's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. Venkat Reddy Navari 5,330 Reputation points Microsoft External Staff Moderator
    2025-07-18T17:18:37.0466667+00:00

    Hi Melissa Lord Thanks for providing the details and screenshots. Based on your observations, the upgrade from Spark 3.3 to 3.4 within the Azure Integration Runtime (IR) appears to align with the significant degradation in your data flow performance.

    Here are some technical points and troubleshooting steps you might consider:

    Spark 3.4 Behavioral Changes: Spark 3.4 introduced various optimizations and also some behavioral changes in Catalyst optimizer, shuffle mechanisms, and API integrations which may impact certain workloads. Specifically:

    • Changes in query optimization and join strategies might cause different physical plans resulting in more expensive shuffles or scans.
    • Updates to adaptive query execution (AQE) parameters might behave differently by default in 3.4, affecting how stages are optimized at runtime.

    Execution Plan Comparison: Extract and compare the Spark SQL query plans (explain() output) for your data flow transformations before and after the upgrade.

    Check for increased shuffle operations, skew, or changed join types (e.g., broadcast join replaced with sort-merge join).

    Use Spark UI or diagnostic logs if available on Azure Data Factory to analyze task time and stage breakdown.

    Spark Configuration Differences: Verify if the Spark configurations (like spark.sql.shuffle.partitions, spark.executor.memory, spark.sql.adaptive.enabled) have changed implicitly with the new runtime.

    You can explicitly set these configurations via your Azure Data Flow debug session or pipeline parameters to experiment if performance improves.

    Azure IR Version Pinning and Rollback: Confirm if Azure IR allows pinning to Spark 3.3 or if you can deploy a new IR instance with the older Spark runtime for comparison.

    This would help isolate whether the Spark upgrade is the root cause versus other environmental changes.

    API Connector Impact: Since your data flow reads from an API endpoint, check if any underlying REST connector or HTTP client libraries were updated in the new runtime which might impact request/response times or retries.

    Evaluate if there are increased throttling or latency issues from the API source on the new runtime.

    Profiling and Metrics:

    • Review detailed profile logs from your data flows for metrics such as computeAcquisitionDuration, idleTimeBeforeCurrentJob, and taskDuration.
    • Look for increased idle times or resource contention which might suggest scheduler or resource allocation changes in Spark 3.4.

    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.