Edit

Share via


Azure Kubernetes Service Communication Manager

The Azure Kubernetes Service (AKS) Communication Manager streamlines notifications for all your AKS maintenance tasks by using Azure Resource Notifications and Azure Resource Graph frameworks. This tool enables you to closely monitor your upgrades because it provides you with timely alerts on event triggers and outcomes. If maintenance fails, it notifies you with the reasons for the failure, reducing operational hassles related to observability and follow-ups. You can set up notifications for all types of autoupgrades that utilize maintenance windows by following these steps.

Prerequisites

Note

Once set up, the communication manager sends advance notices - one week before maintenance starts and one day before maintenance starts. This is in addition to the timely alerts during the maintenance operation.

How to set up communication manager

  1. Go to the resource, then choose Monitoring and select Alerts and then click into Alert Rules.

  2. The Condition for the alert should be a Custom log search.

    The screenshot of the custom log search in the alert rule blade.

  3. In the opened "Search query" box, paste one of the following custom queries and click "Review+Create" button.

Query for cluster auto upgrade notifications

containerserviceeventresources
| where type == "microsoft.containerservice/managedclusters/scheduledevents"
| where id contains "/subscriptions/<subid>/resourcegroups/<rgname>/providers/Microsoft.ContainerService/managedClusters/<clustername>"
| where properties has "eventStatus"
| extend status = substring(properties, indexof(properties, "eventStatus") + strlen("eventStatus") + 3, 50)
| extend status = substring(status, 0, indexof(status, ",") - 1)
| where status != ""
| where properties has "eventDetails"
| extend upgradeType = case(
                           properties has "K8sVersionUpgrade",
                           "K8sVersionUpgrade",
                           properties has "NodeOSUpgrade",
                           "NodeOSUpgrade",
                           ""
                       )
| extend details = parse_json(tostring(properties.eventDetails))
| where properties has "lastUpdateTime"
| extend eventTime = substring(properties, indexof(properties, "lastUpdateTime") + strlen("lastUpdateTime") + 3, 50)
| extend eventTime = substring(eventTime, 0, indexof(eventTime, ",") - 1)
| extend eventTime = todatetime(tostring(eventTime))
| where eventTime >= ago(2h)
| where upgradeType == "K8sVersionUpgrade"
| project
    eventTime,
    upgradeType,
    status,
    properties,
    name,
    details
| order by eventTime asc

Query for Node OS auto upgrade notifications

containerserviceeventresources
| where type == "microsoft.containerservice/managedclusters/scheduledevents"
| where id contains "/subscriptions/<subid>/resourcegroups/<rgname>/providers/Microsoft.ContainerService/managedClusters/<clustername>"
| where properties has "eventStatus"
| extend status = substring(properties, indexof(properties, "eventStatus") + strlen("eventStatus") + 3, 50)
| extend status = substring(status, 0, indexof(status, ",") - 1)
| where status != ""
| where properties has "eventDetails"
| extend upgradeType = case(
                           properties has "K8sVersionUpgrade",
                           "K8sVersionUpgrade",
                           properties has "NodeOSUpgrade",
                           "NodeOSUpgrade",
                           ""
                       )
| extend details = parse_json(tostring(properties.eventDetails))
| where properties has "lastUpdateTime"
| extend eventTime = substring(properties, indexof(properties, "lastUpdateTime") + strlen("lastUpdateTime") + 3, 50)
| extend eventTime = substring(eventTime, 0, indexof(eventTime, ",") - 1)
| extend eventTime = todatetime(tostring(eventTime))
| where eventTime >= ago(2h)
| where upgradeType == "NodeOSUpgrade"
| project
    eventTime,
    upgradeType,
    status,
    properties,
    name,
    details
| order by eventTime asc
  1. Configure the alert conditions with the following settings:
    • Measurement: Select "Table rows"
    • Aggregation: Select "Count"
    • Aggregation granularity: Select "30 minutes"
    • Threshold value: Keep at 0
    • Split by dimensions: Select "status" and choose "Include all future values"

The screenshot of the configuration options for alert conditions.

  1. When selecting "status" in the Split by dimensions dropdown, the available values are: Scheduled, Started, Completed, Canceled, and Failed.

    Note

    These status values will only appear if your cluster has previously executed auto upgrade operations. For new clusters or clusters that haven't undergone auto upgrades yet, the dropdown may appear empty or show no available dimensions. Once your cluster performs its first auto upgrade, these status values will become available for selection.

The screenshot of the split by dimensions drop down.

  1. Check an action group with the correct email address exists, to receive the notifications.

The screenshot of entering appropriate email or SMS into an action group.

  1. Assign Managed System Identity: After you create the alert rule, assign a managed identity so it can access the necessary resources. This step is performed after the alert rule is created, not during initial setup. To assign a managed identity:

    • In the Azure portal, go to Monitor > Alerts > Alert rules, then select your alert rule.
    • In the alert rule pane, under Settings, select Identity.
    • Set System assigned managed identity to On.
    • Click Save to enable the managed identity for the alert rule.

    The screenshot of where to assign Managed System Identity.

    Tip

    If you don't see the Identity option, make sure your alert rule has been created and you have the necessary permissions. Assigning the managed identity is always a separate step after alert rule creation.

  2. Make sure to assign the appropriate Reader roles.

    In the alert rule, go to Settings > Identity > System assigned managed identity > Azure role assignments > Add role assignment.

    Choose the Reader role and assign it to the resource group. Repeat "Add role assignment" for the subscription if needed.

    Note

    After Communication Manager is set up, it sends advance notices one week before maintenance starts and one day before maintenance starts. It also sends you timely alerts during the maintenance operation.

Set up Communication Manager

  1. Go to the resource, select Monitoring, select Alerts, and then select Alert Rules.

  2. On the Condition tab, for Signal name, select Custom log search.

    Screenshot that shows the custom log search in the alert rule pane.

  3. In the Search query box, paste one of the following custom queries and then select the Review+Create button.

    The following query is for cluster autoupgrade notifications:

     arg("").containerserviceeventresources
     | where type == "microsoft.containerservice/managedclusters/scheduledevents"
     | where id contains "/subscriptions/<subid>/resourcegroups/<rgname>/providers/Microsoft.ContainerService/managedClusters/<clustername>"
     | where properties has "eventStatus"
     | extend status = substring(properties, indexof(properties, "eventStatus") + strlen("eventStatus") + 3, 50)
     | extend status = substring(status, 0, indexof(status, ",") - 1)
     | where status != ""
     | where properties has "eventDetails"
     | extend upgradeType = case(
                                properties has "K8sVersionUpgrade",
                                "K8sVersionUpgrade",
                                properties has "NodeOSUpgrade",
                                "NodeOSUpgrade",
                                ""
                            )
     | extend details = parse_json(tostring(properties.eventDetails))
     | where properties has "lastUpdateTime"
     | extend eventTime = substring(properties, indexof(properties, "lastUpdateTime") + strlen("lastUpdateTime") + 3, 50)
     | extend eventTime = substring(eventTime, 0, indexof(eventTime, ",") - 1)
     | extend eventTime = todatetime(tostring(eventTime))
     | where eventTime >= ago(2h)
     | where upgradeType == "K8sVersionUpgrade"
     | project
         eventTime,
         upgradeType,
         status,
         properties,
         name,
         details
     | order by eventTime asc
    

    The following query is for Node OS autoupgrade notifications:

     arg("").containerserviceeventresources
     | where type == "microsoft.containerservice/managedclusters/scheduledevents"
     | where id contains "/subscriptions/<subid>/resourcegroups/<rgname>/providers/Microsoft.ContainerService/managedClusters/<clustername>"
     | where properties has "eventStatus"
     | extend status = substring(properties, indexof(properties, "eventStatus") + strlen("eventStatus") + 3, 50)
     | extend status = substring(status, 0, indexof(status, ",") - 1)
     | where status != ""
     | where properties has "eventDetails"
     | extend upgradeType = case(
                                properties has "K8sVersionUpgrade",
                                "K8sVersionUpgrade",
                                properties has "NodeOSUpgrade",
                                "NodeOSUpgrade",
                                ""
                            )
     | extend details = parse_json(tostring(properties.eventDetails))
     | where properties has "lastUpdateTime"
     | extend eventTime = substring(properties, indexof(properties, "lastUpdateTime") + strlen("lastUpdateTime") + 3, 50)
     | extend eventTime = substring(eventTime, 0, indexof(eventTime, ",") - 1)
     | extend eventTime = todatetime(tostring(eventTime))
     | where eventTime >= ago(2h)
     | where upgradeType == "NodeOSUpgrade"
     | project
         eventTime,
         upgradeType,
         status,
         properties,
         name,
         details
     | order by eventTime asc
    
  4. The interval should be 30 minutes, and the threshold should be 1.

  5. Make sure that an action group with the correct email address exists, so that you can receive the notifications.

  6. Make sure to give the Read role to the resource group and to the subscription to the managed identity of the log search alert rule.

  7. Go to the alert rule: Settings > Identity > System assigned managed identity > Azure role assignments > Add role assignment.

  8. Select the Reader role and assign it to the resource group. Repeat Add role assignment for the subscription.

Verification

To upgrade the cluster, wait for the autoupgrader to start. Then verify that you promptly receive notices on the email configured to receive notices.

Check the Azure Resource Graph database for the scheduled notification record. Each scheduled event notification should be listed as one record in the containerserviceeventresources table.

Screenshot that shows how to look up Azure Resource Graph.