RabbitMQ, AKS and monitoring of PODs

pjbcoetzer 0 Reputation points
2025-07-31T19:43:20.32+00:00

Hi all,

I've set up a RabbitMQ service on to AKS using the latest (at time of writing ver 4.1.1) Cluster and Topology Operators . The setup was intuitive enough by setting up some manifest files and then applying these to the AKS cluster using kubectl commands.

I was able to set up:

  • a Namespace
  • Generate a External Certificate using Cert-Man (and Let's Encrypt) for TLS
  • Generate a Internal Self Signed Certificate using Cert-Man
  • Installing the RabbitMQ cluster-operator
  • Installing the RabbitMQ messaging-topology-operator-with-certmanager operator
  • Applying a Cluster (also enabling MQTT plugin)
  • Set Up Users (using Secrets and Permissions)
  • Add an External load balancer (and applying the ports for client access)
  • Add an Internal Loadbalancer (and applying the ports for services)
  • and an ingress resource using nginx for the internal tls proxy

The setup is running good and my clients and services are connecting to the service without issue, so technically I have a good running service...

-- BUT --

My pod logs have been flooded with TLS errors but originating from internal namespaces including:

  • kube-system
  • calico-system
  • tigera-operator

(I cannot distinguish which one it is as they are shareing the same IP's)

These are the errors:

[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:48397
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:63038
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:35356
[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.11:63171
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:32319
[error] timeout error during handshake for protocol 'amqp/ssl' and peer 10.2.1.12:25044
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:19395
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.12:52790
[error] timeout error during handshake for protocol 'mqtt/ssl' and peer 10.2.1.11:55244

I reached out to the RabbitMQ team, and they grasiously helped me, and it soon was found that these errors were logged correctly, but was originating from one of the services mentioned, and as this is a AKS deployment, this is how fare they could help with the debugging.

It would seem that the issue is with the AKS monitoring service using prometheus scrapeing to get monitoring statistics into Azure (which by the way is enabled by default), and it would seem that because the monitoring service does not have the self-generated Cert to access the TLS ports, that it is inherently creating a error log in the Rabbit MQ Pod.

This is something out of my domain, and I found this kind of "hidden" section way deep in the docs where you can opt out of the metrics collection (Disable control plane metrics on your AKS Cluster), but it is a kind of "opt out of everything" option, and there is no real granular settings I ould find to not monitor the TLS ports of my pods. It sounds a bit bizar, but that is why I'm posting this question.

So, the question:

How can I configure the Azure Monitoring Service (by assumption using Prometheus Scrapeing), to not include trying to connect to my pod endpoints over TLS, but then still give me the option to have some monitoring (without flooding the pod logs).

PS. I have disabled the AKS monitoring on my cluster using the command, rebuilt my RabbitMQ instance and the default monitoring is still enabled in the Aure portal, and unfortunately the errors are still being generated in the Pod logs.

Cheers,

Pieter

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
{count} votes

1 answer

Sort by: Most helpful
  1. Durga Reshma Malthi 9,355 Reputation points Microsoft External Staff Moderator
    2025-08-01T12:50:45.35+00:00

    Hi pjbcoetzer

    As per this documentation - https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#collection-rules Azure Monitor’s container insights do not support prometheus.io/scrape annotations.

    And as per this document - https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-troubleshoot, the ama-metrics-settings-configmap only controls metrics collection, not connection attempts.

    If you still want metrics, then mention scraping as true and modify the configMap like this:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: container-azm-ms-agentconfig
      namespace: kube-system
    data:
      schema-version: v1
      config-version: ver1
      prometheus-data-collection-settings: |
        {
          "enablePrometheusMetricsCollection": true,
          "enableDefaultScraping": true,
          "scraping": {
            "excludeNamespaces": [ "rabbitmq", "rabbitmq-mqtt" ],
            "includeNamespaces": [ "*" ]
          }
        }
    

    After applying this configMap, then restart the Daemon set.

    Hope this helps!

    Please Let me know if you have any queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.