Edit

Share via


Use ClusterStagedUpdateRun to orchestrate staged rollouts across member clusters

Azure Kubernetes Fleet Manager staged update runs provide a controlled approach to deploying workloads across multiple member clusters using a stage-by-stage process. This approach allows you to minimize risk by deploying to targeted clusters sequentially, with optional wait times and approval gates between stages.

This article shows you how to create and execute staged update runs to deploy workloads progressively and roll back to previous versions when needed.

Prerequisites

  • You need an Azure account with an active subscription. Create an account for free.

  • To understand the concepts and terminology used in this article, read the conceptual overview of staged rollout strategies.

  • You need Azure CLI version 2.58.0 or later installed to complete this article. To install or upgrade, see Install the Azure CLI.

  • If you don't have the Kubernetes CLI (kubectl) already, you can install it by using this command:

    az aks install-cli
    
  • You need the fleet Azure CLI extension. You can install it by running the following command:

    az extension add --name fleet
    

    Run the az extension update command to update to the latest version of the extension:

    az extension update --name fleet
    

Configure the demo environment

This demo runs on a Fleet Manager with a hub cluster and three member clusters. If you don't have one, follow the quickstart to create a Fleet Manager with a hub cluster. Then, join Azure Kubernetes Service (AKS) clusters as members.

This tutorial demonstrates staged update runs using a demo fleet environment with three member clusters that have the following labels:

cluster name labels
member1 environment=canary, order=2
member2 environment=staging
member3 environment=canary, order=1

These labels allow us to create stages that group clusters by environment and control the deployment order within each stage.

Prepare workloads for placement

Next, publish workloads to the hub cluster so that they can be placed onto member clusters.

Create a namespace and configmap for the workload on the hub cluster:

kubectl create ns test-namespace
kubectl create cm test-cm --from-literal=key=value1 -n test-namespace

To deploy the resources, create a ClusterResourcePlacement:

Note

The spec.strategy.type is set to External to allow rollout triggered with a ClusterStagedUpdateRun.

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: example-placement
spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      name: test-namespace
      version: v1
  policy:
    placementType: PickAll
  strategy:
    type: External

All three clusters should be scheduled since we use the PickAll policy, but no resources should be deployed on the member clusters yet because we didn't create a ClusterStagedUpdateRun.

Verify the placement is scheduled:

kubectl get crp example-placement

Your output should look similar to the following example:

NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1                                           51s

Work with resource snapshots

Fleet Manager creates resource snapshots when resources change. Each snapshot has a unique index that you can use to reference specific versions of your resources.

Tip

For more information about resource snapshots and how they work, see Understanding resource snapshots.

Check current resource snapshots

To check current resource snapshots:

kubectl get clusterresourcesnapshots --show-labels

Your output should look similar to the following example:

NAME                           GEN   AGE   LABELS
example-placement-0-snapshot   1     60s   kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0

We only have one version of the snapshot. It's the current latest (kubernetes-fleet.io/is-latest-snapshot=true) and has resource-index 0 (kubernetes-fleet.io/resource-index=0).

Create a new resource snapshot

Now modify the configmap with a new value:

kubectl edit cm test-cm -n test-namespace

Update the value from value1 to value2:

kubectl get configmap test-cm -n test-namespace -o yaml

Your output should look similar to the following example:

apiVersion: v1
data:
  key: value2 # value updated here, old value: value1
kind: ConfigMap
metadata:
  creationTimestamp: ...
  name: test-cm
  namespace: test-namespace
  resourceVersion: ...
  uid: ...

Now you should see two versions of resource snapshots with index 0 and 1 respectively:

kubectl get clusterresourcesnapshots --show-labels

Your output should look similar to the following example:

NAME                           GEN   AGE    LABELS
example-placement-0-snapshot   1     2m6s   kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
example-placement-1-snapshot   1     10s    kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=1

The latest label is set to example-placement-1-snapshot, which contains the latest configmap data:

kubectl get clusterresourcesnapshots example-placement-1-snapshot -o yaml

Your output should look similar to the following example:

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
  annotations:
    kubernetes-fleet.io/number-of-enveloped-object: "0"
    kubernetes-fleet.io/number-of-resource-snapshots: "1"
    kubernetes-fleet.io/resource-hash: 10dd7a3d1e5f9849afe956cfbac080a60671ad771e9bda7dd34415f867c75648
  creationTimestamp: "2025-07-22T21:26:54Z"
  generation: 1
  labels:
    kubernetes-fleet.io/is-latest-snapshot: "true"
    kubernetes-fleet.io/parent-CRP: example-placement
    kubernetes-fleet.io/resource-index: "1"
  name: example-placement-1-snapshot
  ownerReferences:
  - apiVersion: placement.kubernetes-fleet.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterResourcePlacement
    name: example-placement
    uid: e7d59513-b3b6-4904-864a-c70678fd6f65
  resourceVersion: "19994"
  uid: 79ca0bdc-0b0a-4c40-b136-7f701e85cdb6
spec:
  selectedResources:
  - apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: test-namespace
      name: test-namespace
    spec:
      finalizers:
      - kubernetes
  - apiVersion: v1
    data:
      key: value2 # latest value: value2, old value: value1
    kind: ConfigMap
    metadata:
      name: test-cm
      namespace: test-namespace

Deploy a ClusterStagedUpdateStrategy

A ClusterStagedUpdateStrategy defines the orchestration pattern that groups clusters into stages and specifies the rollout sequence. It selects member clusters by labels. For our demonstration, we create one with two stages, staging and canary:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateStrategy
metadata:
  name: example-strategy
spec:
  stages:
    - name: staging
      labelSelector:
        matchLabels:
          environment: staging
      afterStageTasks:
        - type: TimedWait
          waitTime: 1m
    - name: canary
      labelSelector:
        matchLabels:
          environment: canary
      sortingLabelKey: order
      afterStageTasks:
        - type: Approval

Deploy a ClusterStagedUpdateRun to roll out latest change

A ClusterStagedUpdateRun executes the rollout of a ClusterResourcePlacement following a ClusterStagedUpdateStrategy. To trigger the staged update run for our ClusterResourcePlacement (CRP), we create a ClusterStagedUpdateRun specifying the CRP name, updateRun strategy name, and the latest resource snapshot index ("1"):

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy

The staged update run is initialized and running:

kubectl get csur example-run

Your output should look similar to the following example:

NAME          PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                         0                       True                      7s

A more detailed look at the status after the one minute TimedWait has elapsed:

kubectl get csur example-run -o yaml

Your output should look similar to the following example:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  ...
  name: example-run
  ...
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy
status:
  conditions:
  - lastTransitionTime: "2025-07-22T21:28:08Z"
    message: ClusterStagedUpdateRun initialized successfully
    observedGeneration: 1
    reason: UpdateRunInitializedSuccessfully
    status: "True" # the updateRun is initialized successfully
    type: Initialized
  - lastTransitionTime: "2025-07-22T21:29:53Z"
    message: The updateRun is waiting for after-stage tasks in stage canary to complete
    observedGeneration: 1
    reason: UpdateRunWaiting
    status: "False" # the updateRun is still progressing and waiting for approval
    type: Progressing
  deletionStageStatus:
    clusters: [] # no clusters need to be cleaned up
    stageName: kubernetes-fleet.io/deleteStage
  policyObservedClusterCount: 3 # number of clusters to be updated
  policySnapshotIndexUsed: "0"
  stagedUpdateStrategySnapshot: # snapshot of the strategy used for this update run
    stages:
    - afterStageTasks:
      - type: TimedWait
        waitTime: 1m0s
      labelSelector:
        matchLabels:
          environment: staging
      name: staging
    - afterStageTasks:
      - type: Approval
      labelSelector:
        matchLabels:
          environment: canary
      name: canary
      sortingLabelKey: order
  stagesStatus: # detailed status for each stage
  - afterStageTaskStatus:
    - conditions:
      - lastTransitionTime: "2025-07-22T21:29:23Z"
        message: Wait time elapsed
        observedGeneration: 1
        reason: AfterStageTaskWaitTimeElapsed
        status: "True" # the wait after-stage task has completed
        type: WaitTimeElapsed
      type: TimedWait
    clusters:
    - clusterName: member2 # stage staging contains member2 cluster only
      conditions:
      - lastTransitionTime: "2025-07-22T21:28:08Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:28:23Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member2 is updated successfully
        type: Succeeded
    conditions:
    - lastTransitionTime: "2025-07-22T21:28:23Z"
      message: All clusters in the stage are updated and after-stage tasks are completed
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "False"
      type: Progressing
    - lastTransitionTime: "2025-07-22T21:29:23Z"
      message: Stage update completed successfully
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True" # stage staging has completed successfully
      type: Succeeded
    endTime: "2025-07-22T21:29:23Z"
    stageName: staging
    startTime: "2025-07-22T21:28:08Z"
  - afterStageTaskStatus:
    - approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:53Z"
        message: ClusterApprovalRequest is created
        observedGeneration: 1
        reason: AfterStageTaskApprovalRequestCreated
        status: "True"
        type: ApprovalRequestCreated
      type: Approval
    clusters:
    - clusterName: member3 # according to the labelSelector and sortingLabelKey, member3 is selected first in this stage
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:23Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:29:38Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member3 update is completed
        type: Succeeded
    - clusterName: member1 # member1 is selected after member3 because of order=2 label
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:38Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:29:53Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member1 update is completed
        type: Succeeded
    conditions:
    - lastTransitionTime: "2025-07-22T21:29:53Z"
      message: All clusters in the stage are updated, waiting for after-stage tasks
        to complete
      observedGeneration: 1
      reason: StageUpdatingWaiting
      status: "False" # stage canary is waiting for approval task completion
      type: Progressing
    stageName: canary
    startTime: "2025-07-22T21:29:23Z"

We can see that the TimedWait period for staging stage has elapsed and we also see that the ClusterApprovalRequest object for the approval task in canary stage was created. We can check the generated ClusterApprovalRequest and see that it's not approved yet

kubectl get clusterapprovalrequest

Your output should look similar to the following example:

NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary                                 2m39s

We can approve the ClusterApprovalRequest by creating a json patch file and applying it:

cat << EOF > approval.json
"status": {
    "conditions": [
        {
            "lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
            "message": "lgtm",
            "observedGeneration": 1,
            "reason": "testPassed",
            "status": "True",
            "type": "Approved"
        }
    ]
}
EOF

Submit a patch request to approve using the JSON file created.

kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json

Then verify that it's approved:

kubectl get clusterapprovalrequest

Your output should look similar to the following example:

NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary   True       True               3m35s

The ClusterStagedUpdateRun now is able to proceed and complete:

kubectl get csur example-run

Your output should look similar to the following example:

NAME          PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                         0                       True          True        5m28s

The ClusterResourcePlacement also shows the rollout completed and resources are available on all member clusters:

kubectl get crp example-placement

Your output should look similar to the following example:

NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1               True        1               8m55s

The configmap test-cm should be deployed on all three member clusters, with latest data:

apiVersion: v1
data:
  key: value2
kind: ConfigMap
metadata:
  ...
  name: test-cm
  namespace: test-namespace
  ...

Deploy a second ClusterStagedUpdateRun to roll back to a previous version

Now suppose the workload admin wants to roll back the configmap change, reverting the value value2 back to value1. Instead of manually updating the configmap from hub, they can create a new ClusterStagedUpdateRun with a previous resource snapshot index, "0" in our context and they can reuse the same strategy:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run-2
spec:
  placementName: example-placement
  resourceSnapshotIndex: "0"
  stagedRolloutStrategyName: example-strategy

Let's check the new ClusterStagedUpdateRun:

kubectl get csur

Your output should look similar to the following example:

NAME            PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run     example-placement   1                         0                       True          True        13m
example-run-2   example-placement   0                         0                       True                      9s

After the one minute TimedWait has elapsed, we should see the ClusterApprovalRequest object created for the new ClusterStagedUpdateRun:

kubectl get clusterapprovalrequest

Your output should look similar to the following example:

NAME                   UPDATE-RUN      STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-2-canary   example-run-2   canary                                 75s
example-run-canary     example-run     canary   True       True               14m

To approve the new ClusterApprovalRequest object, let's reuse the same approval.json file to patch it:

kubectl patch clusterapprovalrequests example-run-2-canary --type='merge' --subresource=status --patch-file approval.json

Verify if the new object is approved:

kubectl get clusterapprovalrequest                                                                            

Your output should look similar to the following example:

NAME                   UPDATE-RUN      STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-2-canary   example-run-2   canary   True       True               2m7s
example-run-canary     example-run     canary   True       True               15m

The configmap test-cm should now be deployed on all three member clusters, with the data reverted to value1:

apiVersion: v1
data:
  key: value1
kind: ConfigMap
metadata:
  ...
  name: test-cm
  namespace: test-namespace
  ...

Clean up resources

When you're finished with this tutorial, you can clean up the resources you created:

# Delete the staged update runs
kubectl delete csur example-run example-run-2

# Delete the staged update strategy
kubectl delete csus example-strategy

# Delete the cluster resource placement
kubectl delete crp example-placement

# Delete the test namespace (this will also delete the configmap)
kubectl delete namespace test-namespace

Next steps

In this article, you learned how to use ClusterStagedUpdateRun to orchestrate staged rollouts across member clusters. You created staged update strategies, executed progressive rollouts, and performed rollbacks to previous versions.

To learn more about staged update runs and related concepts, see the following resources: