Hi Melissa Lord Thanks for providing the details and screenshots. Based on your observations, the upgrade from Spark 3.3 to 3.4 within the Azure Integration Runtime (IR) appears to align with the significant degradation in your data flow performance.
Here are some technical points and troubleshooting steps you might consider:
Spark 3.4 Behavioral Changes: Spark 3.4 introduced various optimizations and also some behavioral changes in Catalyst optimizer, shuffle mechanisms, and API integrations which may impact certain workloads. Specifically:
- Changes in query optimization and join strategies might cause different physical plans resulting in more expensive shuffles or scans.
- Updates to adaptive query execution (AQE) parameters might behave differently by default in 3.4, affecting how stages are optimized at runtime.
Execution Plan Comparison: Extract and compare the Spark SQL query plans (explain() output) for your data flow transformations before and after the upgrade.
Check for increased shuffle operations, skew, or changed join types (e.g., broadcast join replaced with sort-merge join).
Use Spark UI or diagnostic logs if available on Azure Data Factory to analyze task time and stage breakdown.
Spark Configuration Differences: Verify if the Spark configurations (like spark.sql.shuffle.partitions
, spark.executor.memory
, spark.sql.adaptive.enabled
) have changed implicitly with the new runtime.
You can explicitly set these configurations via your Azure Data Flow debug session or pipeline parameters to experiment if performance improves.
Azure IR Version Pinning and Rollback: Confirm if Azure IR allows pinning to Spark 3.3 or if you can deploy a new IR instance with the older Spark runtime for comparison.
This would help isolate whether the Spark upgrade is the root cause versus other environmental changes.
API Connector Impact: Since your data flow reads from an API endpoint, check if any underlying REST connector or HTTP client libraries were updated in the new runtime which might impact request/response times or retries.
Evaluate if there are increased throttling or latency issues from the API source on the new runtime.
Profiling and Metrics:
- Review detailed profile logs from your data flows for metrics such as
computeAcquisitionDuration
,idleTimeBeforeCurrentJob
, andtaskDuration
. - Look for increased idle times or resource contention which might suggest scheduler or resource allocation changes in Spark 3.4.
Hope this helps. If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.