Stream Analytics compatibility 1.2 - PARTITION BY PartitionId

Guusje 40 Reputation points
2025-05-12T10:53:56.6233333+00:00

Will the PARTITION BY PartitionId statement still ensure parallel processing (and make sure that input partition = output partition) for compatibility level 1.2 when we don't specify a partition key column?

Our situation:

  • we have 2 input eventhubs (both 16 partitions) and an output eventhub (also 16 partitions)
  • we have a stream analytics job that joins messages from these 2 input eventhubs and writes them to 1 output eventhub - these messages are also joined on partititonID
  • we use compatibility level 1.2
  • we have NOT specified a partition key column
  • we DO however still use the PARTITION BY PartitionId statement in our query

Question:

According to the documentation, I would expect the messages to be processed embarrassingly parallel. Where I assume that the messages will end up on the same partition on the output eventhub as their originating partition (i.e. messages from input partition 1 will end up on output partition 1).

However, is it a problem that we use compatibility level 1.2? In the documentation I read that PARTITION BY PartitionId is required for compatibility <1.2, however, does this still work for 1.2 or do we then need to set an partition key column? (we don't do that yet since our partition key is a combination of multiple columns so setting the output partittionkey to be the same as the input partition key was an easier solution for us)

Our simplified query:

SELECT
	evh1.timestamp,
	evh1.some_column,
	evh2.timestamp,
    evh2.some_other_column
FROM
    [evh1] evh1 TIMESTAMP BY evh1.timestamp PARTITION BY PartitionId
    JOIN [evh2] evh2 TIMESTAMP BY evh2.timestamp PARTITION BY PartitionId
ON
	<some joining logic>
	AND ppv.PartitionId = batchid.PartitionId

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
0 comments No comments
{count} votes

Accepted answer
  1. Anonymous
    2025-05-12T11:45:53.9466667+00:00

    Hi @Guusje

    Will the PARTITION BY PartitionId statement still ensure parallel processing (and make sure that input partition = output partition) for compatibility level 1.2 when we don't specify a partition key column?

    Yes, in compatibility level 1.2, PARTITION BY PartitionId still ensures partition-aware parallel processing and maintains partition affinity between input and output, even if you don't specify a partition key column. As long as all inputs and the output Event Hub have the same number of partitions, and PartitionId is used consistently in the query, messages from input partition X will be routed to output partition X.
    https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

    is it a problem that we use compatibility level 1.2?

    No, it's not an issue. Compatibility level 1.2 fully supports PARTITION BY PartitionId for enabling partition-aware parallelism. In fact, this level brings performance enhancements and continues to support partitioned processing when implemented properly.

    https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-compatibility-level

    In the documentation I read that PARTITION BY PartitionId is required for compatibility <1.2, however, does this still work for 1.2 or do we then need to set an partition key column?

     

    PARTITION BY PartitionId still works as expected in compatibility level 1.2. You are not required to set a partition key column unless you want to partition by a custom key.Using PartitionId in your query ensures the system maintains partition alignment between input and output when the partition counts match.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.