RabbitMQ, AKS and monitoring of PODs

Question

RabbitMQ, AKS and monitoring of PODs

pjbcoetzer 0

Hi all,

I've set up a RabbitMQ service on to AKS using the latest (at time of writing ver 4.1.1) Cluster and Topology Operators . The setup was intuitive enough by setting up some manifest files and then applying these to the AKS cluster using kubectl commands.

I was able to set up:

a Namespace
Generate a External Certificate using Cert-Man (and Let's Encrypt) for TLS
Generate a Internal Self Signed Certificate using Cert-Man
Installing the RabbitMQ cluster-operator
Installing the RabbitMQ messaging-topology-operator-with-certmanager operator
Applying a Cluster (also enabling MQTT plugin)
Set Up Users (using Secrets and Permissions)
Add an External load balancer (and applying the ports for client access)
Add an Internal Loadbalancer (and applying the ports for services)
and an ingress resource using nginx for the internal tls proxy

The setup is running good and my clients and services are connecting to the service without issue, so technically I have a good running service...

-- BUT --

My pod logs have been flooded with TLS errors but originating from internal namespaces including:

kube-system
calico-system
tigera-operator

(I cannot distinguish which one it is as they are shareing the same IP's)

These are the errors:

[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:48397
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:63038
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:35356
[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:63171
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:32319
[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.12:25044
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:19395
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:52790
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:55244

I reached out to the RabbitMQ team, and they grasiously helped me, and it soon was found that these errors were logged correctly, but was originating from one of the services mentioned, and as this is a AKS deployment, this is how fare they could help with the debugging.

It would seem that the issue is with the AKS monitoring service using prometheus scrapeing to get monitoring statistics into Azure (which by the way is enabled by default), and it would seem that because the monitoring service does not have the self-generated Cert to access the TLS ports, that it is inherently creating a error log in the Rabbit MQ Pod.

This is something out of my domain, and I found this kind of "hidden" section way deep in the docs where you can opt out of the metrics collection (Disable control plane metrics on your AKS Cluster), but it is a kind of "opt out of everything" option, and there is no real granular settings I ould find to not monitor the TLS ports of my pods. It sounds a bit bizar, but that is why I'm posting this question.

So, the question:

How can I configure the Azure Monitoring Service (by assumption using Prometheus Scrapeing), to not include trying to connect to my pod endpoints over TLS, but then still give me the option to have some monitoring (without flooding the pod logs).

PS. I have disabled the AKS monitoring on my cluster using the command, rebuilt my RabbitMQ instance and the default monitoring is still enabled in the Aure portal, and unfortunately the errors are still being generated in the Pod logs.

Cheers,

Pieter

Nikhil Duserla 8,515 Reputation points Microsoft External Staff Moderator

2025-07-31T22:47:01.7666667+00:00
Hello @pjbcoetzer,

Azure Monitor's Prometheus scraping honors standard prometheus.io annotations on Kubernetes services and pods. You can try excluding specific services or pods from being scraped by either omitting the annotation or explicitly setting:

prometheus.io/scrape: "false"

To expose metrics only on non-TLS ports, you can define two separate Kubernetes Service resources:

Internal Metrics Service (non-TLS) Includes

prometheus.io/scrape: "true"

Client Access Service (TLS) Includes

prometheus.io/scrape: "false"

Ensure that your Prometheus scrape configuration -if self-managed or Azure Monitor settings if using Azure Managed Prometheus are properly set up to honor these annotations.

pjbcoetzer 0

Hi Nikhil,

Thanks for the feedback. I will give it a try and hopefully it will limit the monitoring on the pods (I will respond with feedback).

I also had a look this morning at the latest Bicep definition for creating the AKS Cluster, and I was kinda surprized (and happy) to see that a new section ("azureMonitorProfile") was added. It is preview at this stage, but this should also give some configuration control on how the resources are monitored.

Ref: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters?pivots=deployment-language-bicep

resource symbolicname 'Microsoft.ContainerService/managedClusters@2025-05-02-preview' = {
  name: 'string'
  properties: {
	...
    azureMonitorProfile: {
      appMonitoring: {
        autoInstrumentation: {
          enabled: bool
        }
        openTelemetryLogs: {
          enabled: bool
          port: int
        }
        openTelemetryMetrics: {
          enabled: bool
          port: int
        }
      }
      containerInsights: {
        disableCustomMetrics: bool
        disablePrometheusMetricsScraping: bool
        enabled: bool
        logAnalyticsWorkspaceResourceId: 'string'
        syslogPort: int
      }
      metrics: {
        enabled: bool
        kubeStateMetrics: {
          metricAnnotationsAllowList: 'string'
          metricLabelsAllowlist: 'string'
        }
      }
    }
	...
}

pjbcoetzer 0

Some feedback....
(PS. I implemented this on a clean AKS deployment that only has cert-manager sand rabbitmq set up)

Unfortunately setting for the prometheus.io/scrape: "false" did not have any effect.

I configured the settings on my rabbitMq resources -

The Cluster/StatefulSet:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: "#{mqttCluster}#"
  namespace: "#{mqttNamespace}#"
  labels:
    app.kubernetes.io/name: "#{mqttCluster}#"
  annotations:
    prometheus.io/scrape: "false"
spec:
...
  override:
    statefulSet:
      spec:
        template:
          metadata:
            annotations:
              prometheus.io/scrape: "false"

The load balancer:

apiVersion: v1
kind: Service
metadata:
  name: mqtt-public-listeners
  namespace: "#{mqttNamespace}#"
  labels:
    app.kubernetes.io/name: "#{mqttCluster}#"
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-resource-group: "#{staticIPResourceGroup}#"
    service.beta.kubernetes.io/azure-pip-name: "#{staticIPName}#"
    service.beta.kubernetes.io/azure-load-balancer-internal: "false"
    service.beta.kubernetes.io/azure-load-balancer-sku: "standard"
    prometheus.io/scrape: "false"
spec:
  type: LoadBalancer
  loadBalancerIP: "#{publicLoadbalancerIp}#"
  selector:
    app.kubernetes.io/name: "#{mqttCluster}#"
  ports:
    - name: mqtt-public
      protocol: TCP
      port: 1883
      targetPort: 1883
    - name: mqtts-public
      protocol: TCP
      port: 8883
      targetPort: 8883

I then thought it might be that Azure Monitoring is not honoring the setting, as your comment mentioned

Ensure that your Prometheus scrape configuration -if self-managed or Azure Monitor settings if using Azure Managed Prometheus are properly set up to honor these annotations.

As this is a AKS deployment, the monitoring is running via Azure (I don't have any self-implemented monitoring), so I found a post stating that I have to set up a configmap in the kube-system namespace and apply the manifest to my cluster instance (I added the namespaces to exclude), but this also seems to have no effect (that said, I'm just going on what I could find on the web).

Unfortunately the pod logs are still flooded... 🙄

The manifest to exclude the namespaces from monitoring...

apiVersion: v1
kind: ConfigMap
metadata:
  name: ama-metrics-settings-configmap
  namespace: kube-system
data:
  schema-version: v1
  config-version: ver1
  settings: |
    monitor_kubernetes_pods = true
    exclude_namespaces = ["rabbitmq", "rabbitmq-mqtt"]

The pod logs:

2025-08-01 11:39:56.660755+00:00 [error] <0.5400.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:5210
2025-08-01 11:39:56.702426+00:00 [error] <0.5405.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:53952
2025-08-01 11:40:00.844384+00:00 [error] <0.5412.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:28432
2025-08-01 11:40:01.464772+00:00 [error] <0.5422.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:25138
2025-08-01 11:40:01.465060+00:00 [error] <0.5428.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.12:19384
2025-08-01 11:40:01.669468+00:00 [error] <0.5455.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:14893
2025-08-01 11:40:01.670332+00:00 [error] <0.5460.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:56809
2025-08-01 11:40:01.718375+00:00 [error] <0.5465.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:18844
2025-08-01 11:40:05.856438+00:00 [error] <0.5477.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:60032
2025-08-01 11:40:06.680504+00:00 [error] <0.5522.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:56622
2025-08-01 11:40:06.680787+00:00 [error] <0.5527.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:14430

Might you have any othe suggestions?

Thx in advance!!

pjbcoetzer 0

Hi Durga,

Thanks for the input. Much appreciated...

I have applied the settings you mentioned, restarted the Daemonsets and deleted my pod, to then respawn. Unfortunately I still see the excessive logging.

What I find interisting, is that I have no other services or client connections (except cert-manager and my devops agent), but still I see these "connection" attempts, and looking at the source IP, it is comming form one of the services on the kube-system manespace (thus is why I was thinking it is the monitoring services).

Pod Log:

2025-08-01 13:06:46.925595+00:00 [info] <0.636.0> started MQTT TCP listener on [::]:1883
2025-08-01 13:06:46.927339+00:00 [info] <0.656.0> started MQTT TLS listener on [::]:8883
2025-08-01 13:06:46.939062+00:00 [info] <0.658.0> Prometheus metrics: HTTP (non-TLS) listener started on port 15692
2025-08-01 13:06:46.939244+00:00 [info] <0.484.0> Ready to start client connection listeners
2025-08-01 13:06:46.944131+00:00 [info] <0.702.0> started TCP listener on [::]:5672
2025-08-01 13:06:46.946399+00:00 [info] <0.722.0> started TLS (SSL) listener on [::]:5671
 completed with 7 plugins.
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0> Server startup complete; 7 plugins started.
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_prometheus
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_peer_discovery_k8s
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_mqtt
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_peer_discovery_common
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_management
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_management_agent
2025-08-01 13:06:47.063828+00:00 [info] <0.484.0>  * rabbitmq_web_dispatch
2025-08-01 13:06:47.136993+00:00 [info] <0.10.0> Time to start RabbitMQ: 5282 ms
2025-08-01 13:07:00.375410+00:00 [error] <0.775.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:1495
2025-08-01 13:07:00.420348+00:00 [error] <0.779.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:18837
2025-08-01 13:07:05.190485+00:00 [error] <0.786.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:28818
2025-08-01 13:07:05.275202+00:00 [error] <0.816.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:29029
2025-08-01 13:07:05.381450+00:00 [error] <0.832.0> timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:11039
2025-08-01 13:07:05.426431+00:00 [error] <0.840.0> timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:58353

Running Pods:

NAMESPACE            NAME                                                        READY   STATUS    RESTARTS      AGE     IP           
azure-devops-agent   azdevops-deployment-6d879dc7b6-mx5wj                        1/1     Running   0             141m    192.168.0.123
cert-manager         cert-manager-58bf86bcd7-mkb2x                               1/1     Running   0             136m    192.168.1.235
cert-manager         cert-manager-cainjector-78bf45b5dd-qjtbk                    1/1     Running   0             136m    192.168.1.85 
cert-manager         cert-manager-webhook-7987476d56-8zvt9                       1/1     Running   0             136m    192.168.1.71 
ingress-external     ingress-nginx-controller-5f4ddc99b7-7lhk5                   1/1     Running   0             136m    192.168.0.94 
ingress-internal     internal-ingress-ingress-nginx-controller-cf8d8d7d6-4zvvl   1/1     Running   0             135m    192.168.1.207
kube-system          ama-logs-6gkqz                                              3/3     Running   1             14m     192.168.0.135
kube-system          ama-logs-bbsmj                                              3/3     Running   1             14m     192.168.1.56 
kube-system          ama-logs-rs-5884cddd78-d969r                                2/2     Running   1 (53m ago)   62m     192.168.0.124
kube-system          azure-cns-58p94                                             1/1     Running   0             3h19m   10.2.1.11    
kube-system          azure-cns-k4sv5                                             1/1     Running   0             3h19m   10.2.1.12    
kube-system          azure-ip-masq-agent-nj87q                                   1/1     Running   0             3h19m   10.2.1.11    
kube-system          azure-ip-masq-agent-t7zhc                                   1/1     Running   0             3h19m   10.2.1.12    
kube-system          azure-npm-ff9gp                                             1/1     Running   0             3h19m   10.2.1.12    
kube-system          azure-npm-smd8g                                             1/1     Running   0             3h19m   10.2.1.11    
kube-system          azure-wi-webhook-controller-manager-7867fd4945-cj75f        1/1     Running   0             3h17m   192.168.0.204
kube-system          azure-wi-webhook-controller-manager-7867fd4945-mrjwk        1/1     Running   0             3h17m   192.168.1.223
kube-system          cloud-node-manager-g47tj                                    1/1     Running   0             3h19m   10.2.1.12    
kube-system          cloud-node-manager-w7xd2                                    1/1     Running   0             3h19m   10.2.1.11    
kube-system          coredns-6f776c8fb5-ldtqd                                    1/1     Running   0             3h22m   192.168.0.89 
kube-system          coredns-6f776c8fb5-x47js                                    1/1     Running   0             3h19m   192.168.0.134
kube-system          coredns-autoscaler-79bcb4fd6b-mdrhv                         1/1     Running   0             3h22m   192.168.0.191
kube-system          csi-azuredisk-node-4tcsm                                    3/3     Running   0             3h19m   10.2.1.12    
kube-system          csi-azuredisk-node-mhf5l                                    3/3     Running   0             3h19m   10.2.1.11    
kube-system          csi-azurefile-node-m4qg2                                    3/3     Running   0             3h19m   10.2.1.11    
kube-system          csi-azurefile-node-q4cjs                                    3/3     Running   0             3h19m   10.2.1.12    
kube-system          konnectivity-agent-7bb5f5c76c-kdb2l                         1/1     Running   0             3h18m   192.168.1.78 
kube-system          konnectivity-agent-7bb5f5c76c-kt4l9                         1/1     Running   0             3h17m   192.168.0.100
kube-system          konnectivity-agent-autoscaler-844df78bbd-g5qdj              1/1     Running   0             3h22m   192.168.0.168
kube-system          kube-proxy-fk8r9                                            1/1     Running   0             3h19m   10.2.1.11    
kube-system          kube-proxy-mqjmh                                            1/1     Running   0             3h19m   10.2.1.12    
kube-system          metrics-server-67b9b5c757-cjptx                             2/2     Running   0             3h19m   192.168.1.61 
kube-system          metrics-server-67b9b5c757-xw7xz                             2/2     Running   0             3h19m   192.168.1.17 
rabbitmq-mqtt        mqtt-cluster-server-0                                       1/1     Running   0             10m     192.168.1.173
rabbitmq-system      messaging-topology-operator-6b9dff6b44-b2l7v                1/1     Running   0             103m    192.168.1.171
rabbitmq-system      rabbitmq-cluster-operator-5567f666b6-zsh5h                  1/1     Running   0             103m    192.168.0.15

My head is breaking with this one, and for some reason I cannot find a way to even Totally disable monitoring (even if just to check if it is the monitoring services causing the excessive loggiing).

The RabbitMQ team has confirmed that these logs are because of a external connection, but I'm going nuts trying to figure out which connection is causing the runaway logs.

Hope someone on the forum might have some insights...

Durga Reshma Malthi 9,355 Microsoft External Staff Moderator

Hi pjbcoetzer

You can try to create network policy that can explicitly denies traffic to RabbitMQ and it will deny TLS ports- https://kubernetes.io/docs/concepts/services-networking/network-policies/

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: Deny traffic
  namespace: rabbitmq
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: mqcluster
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: rabbitmq
      ports:
        - protocol: TCP
          port: 5671
        - protocol: TCP
          port: 8883

You can modify this YAML File as needed.

Hope this helps!

Please Let me know if you have any queries.

pjbcoetzer 0 Reputation points

2025-08-04T08:07:58.3266667+00:00

Hi Durga,

Yes, good one. That was also a thought I had, but I'm struggeling to block the built-in Azure Monitoring Services, as these are valid ports that are being used by internal services (app services and functions), and then also external (client) services (connecting via an external loadbalabcer). So it's a catch 22 to block the unwanted, but allow valid systems.

Would have been nice to have a bit more control over the Azure Monitoring Services, to have a configuration to say, "hey don't monitor pods ports, but check that their Alive status".

I know I'm asking alot here (and sorry I'm quite new to the whole setup), but do you think I can set up a network policy to block the ama- and metrics- internal pods (from what I read, these are the ones collecting the monitoring data)?

Thanks again for the insights...

pjbcoetzer 0

Oh wait... (sorry posting bofore you might respond)

I think I might have, it but will need to test it...

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: block-azure-monitoring-ingress
  namespace: rabbitmq
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: "#{mqttCluster}#"
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - ama-metrics
                  - ama-logs
                  - azure-monitor-agent
          ports:
            - protocol: TCP
              port: 5671
            - protocol: TCP
              port: 8883

I will deploy and test it ;-)

Durga Reshma Malthi 9,355 Reputation points Microsoft External Staff Moderator

2025-08-05T09:30:23.3933333+00:00

Hi pjbcoetzer

Just checking in to see if you had a chance to review the response provided to your question.

Feel free to reach out if you have any further queries.
pjbcoetzer 0 Reputation points

2025-08-05T14:35:47.69+00:00
Hi Durga,

I did a couple of different things, and still it did not yield any success.

From what I could read, the networking.k8s.io/v1 - NetworkPolicy does not block, but explicitly enforce rules to allow resources access. The catch is that my services (internal processing applications) and also the external clients use the same SSL ports to connect to RabbitMQ. The problem is that what ever from the kube-system namespace is connecting to the 2 ports (and flooding my logs), uses the same ports as the clients (typically for monitoring), but I just can't get to shut it off.

It's a frustrating occurrence, because this is a baked-in Azure feature. I spent some time deploying the cluster to a VM, and there because there it runs with no problems (and no errors as well). So with this I'm sure that it is because of Azure monitoring, but yeah, running a VM kind of defeats the objective of having a clustered AKS service.

What bugs me is that I can't believe that I'm the only one with this problem, or could it be that others do not deploy RabbitMQ with TLS?

So, because of time constraints, I had to go the extreme way, and I totally disabled all the RabbitMQ logging. Not ideal, but it means that the pod logs do not get flooded, although I'm just muting the errors.

Hopefully soon I will be able to find more time to deep dive into this.

Pieter

1 answer

Your answer

Nikhil Duserla 8,515 Reputation points Microsoft External Staff Moderator

2025-07-31T22:47:01.7666667+00:00

Hello @pjbcoetzer,

Azure Monitor's Prometheus scraping honors standard prometheus.io annotations on Kubernetes services and pods. You can try excluding specific services or pods from being scraped by either omitting the annotation or explicitly setting:

prometheus.io/scrape: "false"

To expose metrics only on non-TLS ports, you can define two separate Kubernetes Service resources:

Internal Metrics Service (non-TLS) Includes

prometheus.io/scrape: "true"

Client Access Service (TLS) Includes

prometheus.io/scrape: "false"

Ensure that your Prometheus scrape configuration -if self-managed or Azure Monitor settings if using Azure Managed Prometheus are properly set up to honor these annotations.
pjbcoetzer 0 Reputation points

2025-08-01T08:09:47.97+00:00

Hi Nikhil,

Thanks for the feedback. I will give it a try and hopefully it will limit the monitoring on the pods (I will respond with feedback).

I also had a look this morning at the latest Bicep definition for creating the AKS Cluster, and I was kinda surprized (and happy) to see that a new section ("azureMonitorProfile") was added. It is preview at this stage, but this should also give some configuration control on how the resources are monitored.

Ref: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters?pivots=deployment-language-bicep

resource symbolicname 'Microsoft.ContainerService/managedClusters@2025-05-02-preview' = { name: 'string' properties: { ... azureMonitorProfile: { appMonitoring: { autoInstrumentation: { enabled: bool } openTelemetryLogs: { enabled: bool port: int } openTelemetryMetrics: { enabled: bool port: int } } containerInsights: { disableCustomMetrics: bool disablePrometheusMetricsScraping: bool enabled: bool logAnalyticsWorkspaceResourceId: 'string' syslogPort: int } metrics: { enabled: bool kubeStateMetrics: { metricAnnotationsAllowList: 'string' metricLabelsAllowlist: 'string' } } } ... }
Durga Reshma Malthi 9,355 Reputation points Microsoft External Staff Moderator

2025-08-01T14:04:47.56+00:00

Hi pjbcoetzer

You can try to create network policy that can explicitly denies traffic to RabbitMQ and it will deny TLS ports- https://kubernetes.io/docs/concepts/services-networking/network-policies/

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: Deny traffic namespace: rabbitmq spec: podSelector: matchLabels: app.kubernetes.io/name: mqcluster policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: rabbitmq ports: - protocol: TCP port: 5671 - protocol: TCP port: 8883

You can modify this YAML File as needed.

Hope this helps!

Please Let me know if you have any queries.
pjbcoetzer 0 Reputation points

2025-08-04T08:07:58.3266667+00:00

Hi Durga,

Yes, good one. That was also a thought I had, but I'm struggeling to block the built-in Azure Monitoring Services, as these are valid ports that are being used by internal services (app services and functions), and then also external (client) services (connecting via an external loadbalabcer). So it's a catch 22 to block the unwanted, but allow valid systems.

Would have been nice to have a bit more control over the Azure Monitoring Services, to have a configuration to say, "hey don't monitor pods ports, but check that their Alive status".

I know I'm asking alot here (and sorry I'm quite new to the whole setup), but do you think I can set up a network policy to block the ama- and metrics- internal pods (from what I read, these are the ones collecting the monitoring data)?

Thanks again for the insights...
pjbcoetzer 0 Reputation points

2025-08-04T08:29:27.1266667+00:00

Oh wait... (sorry posting bofore you might respond)

I think I might have, it but will need to test it...

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: block-azure-monitoring-ingress namespace: rabbitmq spec: podSelector: matchLabels: app.kubernetes.io/name: "#{mqttCluster}#" policyTypes: - Ingress ingress: - from: - podSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - ama-metrics - ama-logs - azure-monitor-agent ports: - protocol: TCP port: 5671 - protocol: TCP port: 8883

I will deploy and test it ;-)
Durga Reshma Malthi 9,355 Reputation points Microsoft External Staff Moderator

2025-08-05T09:30:23.3933333+00:00

Hi pjbcoetzer

Just checking in to see if you had a chance to review the response provided to your question.

Feel free to reach out if you have any further queries.
pjbcoetzer 0 Reputation points

2025-08-05T14:35:47.69+00:00

Hi Durga,

I did a couple of different things, and still it did not yield any success.

From what I could read, the networking.k8s.io/v1 - NetworkPolicy does not block, but explicitly enforce rules to allow resources access. The catch is that my services (internal processing applications) and also the external clients use the same SSL ports to connect to RabbitMQ. The problem is that what ever from the kube-system namespace is connecting to the 2 ports (and flooding my logs), uses the same ports as the clients (typically for monitoring), but I just can't get to shut it off.

It's a frustrating occurrence, because this is a baked-in Azure feature. I spent some time deploying the cluster to a VM, and there because there it runs with no problems (and no errors as well). So with this I'm sure that it is because of Azure monitoring, but yeah, running a VM kind of defeats the objective of having a clustered AKS service.

What bugs me is that I can't believe that I'm the only one with this problem, or could it be that others do not deploy RabbitMQ with TLS?

So, because of time constraints, I had to go the extreme way, and I totally disabled all the RabbitMQ logging. Not ideal, but it means that the pod logs do not get flooded, although I'm just muting the errors.

Hopefully soon I will be able to find more time to deep dive into this.

Pieter

Answer 1

Hi pjbcoetzer

As per this documentation - https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#collection-rules Azure Monitor’s container insights do not support prometheus.io/scrape annotations.

And as per this document - https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-troubleshoot, the ama-metrics-settings-configmap only controls metrics collection, not connection attempts.

If you still want metrics, then mention scraping as true and modify the configMap like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: container-azm-ms-agentconfig
  namespace: kube-system
data:
  schema-version: v1
  config-version: ver1
  prometheus-data-collection-settings: |
    {
      "enablePrometheusMetricsCollection": true,
      "enableDefaultScraping": true,
      "scraping": {
        "excludeNamespaces": [ "rabbitmq", "rabbitmq-mqtt" ],
        "includeNamespaces": [ "*" ]
      }
    }

After applying this configMap, then restart the Daemon set.

Hope this helps!

Please Let me know if you have any queries.

Share via

RabbitMQ, AKS and monitoring of PODs

1 answer

Your answer