Prometheus support

Mgmt comes with a built-in prometheus support. It is disabled by default, and can be enabled with the --prometheus command line switch.

By default, the prometheus instance will listen on 127.0.0.1:9233. You can change this setting by using the --prometheus-listen cli option:

To have mgmt prometheus bind interface on 0.0.0.0:45001, use: ./mgmt r --prometheus --prometheus-listen :45001

Metrics

Mgmt exposes three kinds of resources: go metrics, etcd metrics and mgmt metrics.

go metrics

We use the prometheus go_collector to expose go metrics. Those metrics are mainly useful for debugging and perf testing.

etcd metrics

mgmt exposes etcd metrics. Read more in the upstream documentation

mgmt metrics

Here is a list of the metrics we provide:

  • mgmt_resources_total: The number of resources that mgmt is managing
  • mgmt_checkapply_total: The number of CheckApply’s that mgmt has run
  • mgmt_failures_total: The number of resources that have failed
  • mgmt_failures: The number of resources that have failed
  • mgmt_graph_start_time_seconds: Start time of the current graph since unix epoch in seconds

For each metric, you will get some extra labels:

  • kind: The kind of mgmt resource

For mgmt_checkapply_total, those extra labels are set:

  • eventful: “true” or “false”, if the CheckApply triggered some changes
  • errorful: “true” or “false”, if the CheckApply reported an error
  • apply: “true” or “false”, if the CheckApply ran in apply or noop mode

Alerting

You can use prometheus to alert you upon changes or failures. We do not provide such templates yet, but we plan to provide some examples in this repository. Patches welcome!

Grafana

We do not have grafana dashboards yet. Patches welcome!