Stream Analytics compatibility 1.2 - PARTITION BY PartitionId

Question

Stream Analytics compatibility 1.2 - PARTITION BY PartitionId

Guusje 40

Will the PARTITION BY PartitionId statement still ensure parallel processing (and make sure that input partition = output partition) for compatibility level 1.2 when we don't specify a partition key column?

Our situation:

we have 2 input eventhubs (both 16 partitions) and an output eventhub (also 16 partitions)
we have a stream analytics job that joins messages from these 2 input eventhubs and writes them to 1 output eventhub - these messages are also joined on partititonID
we use compatibility level 1.2
we have NOT specified a partition key column
we DO however still use the PARTITION BY PartitionId statement in our query

Question:

According to the documentation, I would expect the messages to be processed embarrassingly parallel. Where I assume that the messages will end up on the same partition on the output eventhub as their originating partition (i.e. messages from input partition 1 will end up on output partition 1).

However, is it a problem that we use compatibility level 1.2? In the documentation I read that PARTITION BY PartitionId is required for compatibility <1.2, however, does this still work for 1.2 or do we then need to set an partition key column? (we don't do that yet since our partition key is a combination of multiple columns so setting the output partittionkey to be the same as the input partition key was an easier solution for us)

Our simplified query:

SELECT
	evh1.timestamp,
	evh1.some_column,
	evh2.timestamp,
    evh2.some_other_column
FROM
    [evh1] evh1 TIMESTAMP BY evh1.timestamp PARTITION BY PartitionId
    JOIN [evh2] evh2 TIMESTAMP BY evh2.timestamp PARTITION BY PartitionId
ON
	<some joining logic>
	AND ppv.PartitionId = batchid.PartitionId

Accepted answer

0 additional answers

Your answer

Answer 1

Hi @Guusje

Will the PARTITION BY PartitionId statement still ensure parallel processing (and make sure that input partition = output partition) for compatibility level 1.2 when we don't specify a partition key column?

Yes, in compatibility level 1.2, PARTITION BY PartitionId still ensures partition-aware parallel processing and maintains partition affinity between input and output, even if you don't specify a partition key column. As long as all inputs and the output Event Hub have the same number of partitions, and PartitionId is used consistently in the query, messages from input partition X will be routed to output partition X.
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

is it a problem that we use compatibility level 1.2?

No, it's not an issue. Compatibility level 1.2 fully supports PARTITION BY PartitionId for enabling partition-aware parallelism. In fact, this level brings performance enhancements and continues to support partitioned processing when implemented properly.

https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-compatibility-level

In the documentation I read that PARTITION BY PartitionId is required for compatibility <1.2, however, does this still work for 1.2 or do we then need to set an partition key column?

PARTITION BY PartitionId still works as expected in compatibility level 1.2. You are not required to set a partition key column unless you want to partition by a custom key.Using PartitionId in your query ensures the system maintains partition alignment between input and output when the partition counts match.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Guusje 40 Reputation points

2025-05-12T12:02:45.3933333+00:00

This helps a lot, thank you for clarifying!

Share via

Stream Analytics compatibility 1.2 - PARTITION BY PartitionId

0 additional answers

Your answer