Dec 15, 2025

Prometheus Labels, Alerting Labels, Recording Rules, and Alerting Rules

Prometheus has two different kinds of labels that people often mix together:

  • PromQL labels, which live on metric time series.
  • Alerting labels, which live on alert objects sent to Alertmanager.

They can share names, and alerting rules often copy metric labels into alert labels, but they are used for different jobs.

Open the interactive Prometheus labels and rules explorer

The Short Version

PromQL labels answer:

Which time series am I querying, filtering, grouping, or aggregating?

Alerting labels answer:

How should this alert be identified, grouped, routed, silenced, and deduplicated?

Recording rules answer:

What expensive or repeated PromQL result should I precompute as a new time series?

Alerting rules answer:

What condition should create a firing alert?

PromQL Labels

PromQL labels are part of the metric data model.

Example:

probe_success{
  job="url-monitors",
  monitor_id="mon_1001",
  monitor_name="checkout_health",
  team="checkout",
  environment="prod",
  instance="https://checkout.example.com/health"
}

These labels are stored with the time series. You use them to select, group, and aggregate metrics.

Select checkout monitors:

probe_success{team="checkout", environment="prod"}

Group availability by team:

avg by (team) (
  probe_success{job="url-monitors"}
)

Find down monitors:

probe_success{job="url-monitors"} == 0

PromQL labels are powerful, but they are also the source of cardinality. Every unique label set creates a distinct time series.

Cardinality Example: One Metric Name, Many Series

This looks like one metric:

http_requests_total

But Prometheus does not store only the metric name. It stores each unique combination of labels as a separate time series.

Example series:

http_requests_total{service="checkout", method="GET", status="200"} 12000
http_requests_total{service="checkout", method="GET", status="500"} 42
http_requests_total{service="checkout", method="POST", status="200"} 8000
http_requests_total{service="billing", method="GET", status="200"} 9100
http_requests_total{service="billing", method="POST", status="500"} 11

Same metric name. Different label sets. Different time series.

If you have:

10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
= 720 active series

That can be fine.

But if you add an unbounded label like user_id:

10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
x 1,000,000 users
= 720,000,000 active series

That is a Prometheus incident waiting to happen.

Good metric label:

http_requests_total{route="/users/:id"}

Bad metric label:

http_requests_total{route="/users/9427f3a0-6e6b-4c02-a96d-4bb42d9f9d31"}

Good metric label:

api_requests_total{status="500", team="checkout"}

Bad metric label:

api_requests_total{status="500", request_id="req_01J9V8H4F9ZQ9Y6T"}

Rule of thumb:

Use labels for bounded dimensions you aggregate by. Do not use labels for per-request, per-user, per-session, or per-event values.

Alerting Labels

Alerting labels are labels on the alert object.

Example alerting rule:

groups:
  - name: url-monitor-alerts
    rules:
      - alert: UrlMonitorDown
        expr: probe_success{job="url-monitors"} == 0
        for: 3m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          monitor_id: "{{ $labels.monitor_id }}"
        annotations:
          summary: "URL monitor failed: {{ $labels.monitor_name }}"
          description: "{{ $labels.instance }} failed blackbox probes for 3 minutes."

When the alert fires, Alertmanager sees an alert object roughly like this:

{
  "labels": {
    "alertname": "UrlMonitorDown",
    "severity": "page",
    "team": "checkout",
    "monitor_id": "mon_1001",
    "job": "url-monitors",
    "instance": "https://checkout.example.com/health"
  },
  "annotations": {
    "summary": "URL monitor failed: checkout_health",
    "description": "https://checkout.example.com/health failed blackbox probes for 3 minutes."
  }
}

Alert labels are used by Alertmanager for:

  • grouping
  • deduplication
  • routing
  • silencing
  • inhibition
  • notification templates

How Metric Labels Become Alert Labels

When an alert expression returns a series, Prometheus starts with the labels from that matching series.

Then the rule’s labels: block adds or overrides labels.

Example metric series:

probe_success{
  job="url-monitors",
  team="checkout",
  monitor_id="mon_1001",
  instance="https://checkout.example.com/health"
} 0

Rule labels:

labels:
  severity: page
  team: "{{ $labels.team }}"

Final alert labels:

alertname="UrlMonitorDown"
job="url-monitors"
team="checkout"
monitor_id="mon_1001"
instance="https://checkout.example.com/health"
severity="page"

The danger is overriding identity labels accidentally. If every alert gets the same labels, Alertmanager may group or deduplicate them in a way you did not intend.

Labels vs Annotations

Use labels for machine decisions.

Good alert labels:

  • alertname
  • severity
  • team
  • service
  • cluster
  • environment
  • monitor_id

Use annotations for human context.

Good annotations:

  • summary
  • description
  • runbook_url
  • dashboard_url
  • logs_url

Labels should be stable. Annotations can be descriptive.

Bad label:

labels:
  error_message: "{{ $value }}"

That can create unstable alert identity.

Better annotation:

annotations:
  description: "Current value is {{ $value }}"

Recording Rules

Recording rules run PromQL and save the result as a new time series.

Use recording rules when:

  • a query is expensive
  • many dashboards repeat the same query
  • many alerts use the same base expression
  • you want a stable, named metric for a derived signal
  • you need to aggregate raw series before querying them often

Example:

groups:
  - name: service-recording-rules
    interval: 30s
    rules:
      - record: service:http_requests:rate5m
        expr: |
          sum by (service, status, team, environment) (
            rate(http_requests_total[5m])
          )

This creates a new metric:

service:http_requests:rate5m

Now dashboards and alerts can use:

sum by (service) (
  service:http_requests:rate5m{status=~"5.."}
)

instead of recalculating the raw rate() every time.

Alerting Rules

Alerting rules run PromQL and create alerts when the expression is true.

Use alerting rules when:

  • humans or automation need to be notified
  • the condition has operational meaning
  • the signal should route by team/severity
  • you need for: pending behavior
  • Alertmanager should group, silence, inhibit, or notify

Example:

groups:
  - name: service-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum by (service, team, environment) (
            service:http_requests:rate5m{status=~"5.."}
          )
          /
          sum by (service, team, environment) (
            service:http_requests:rate5m
          )
          > 0.02
        for: 10m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          service: "{{ $labels.service }}"
          environment: "{{ $labels.environment }}"
        annotations:
          summary: "High error rate for {{ $labels.service }}"
          description: "5xx rate has been above 2% for 10 minutes."

Recording Rules vs Alerting Rules

TopicRecording ruleAlerting rule
OutputNew time seriesAlert object
Stored in TSDBYesNo, alert state is tracked separately
Sent to AlertmanagerNoYes, when firing
Main purposePrecompute reusable PromQLNotify on conditions
Uses labels forResult series dimensionsRouting, grouping, identity
Typical examplesrequest rate, p95 latency, SLO burn base metricshigh error rate, URL down, disk full

Use both together:

raw metrics
  -> recording rule creates clean derived metric
  -> alerting rule evaluates derived metric
  -> Alertmanager routes alert

Use Case: URL Monitor

Metric labels:

probe_success{
  job="url-monitors",
  monitor_id="mon_1001",
  monitor_name="checkout_health",
  team="checkout",
  environment="prod",
  instance="https://checkout.example.com/health"
}

Recording rule:

groups:
  - name: url-monitor-recording-rules
    rules:
      - record: monitor:probe_success:avg5m
        expr: |
          avg_over_time(probe_success{job="url-monitors"}[5m])

Alerting rule:

groups:
  - name: url-monitor-alerts
    rules:
      - alert: UrlMonitorDown
        expr: monitor:probe_success:avg5m < 1
        for: 3m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          monitor_id: "{{ $labels.monitor_id }}"
        annotations:
          summary: "URL monitor failed: {{ $labels.monitor_name }}"

Alertmanager route:

route:
  group_by: ["team", "alertname", "environment"]
  routes:
    - matchers:
        - team="checkout"
        - severity="page"
      receiver: checkout-pagerduty

Practical Rules

  • Put query dimensions in PromQL labels.
  • Put routing dimensions in alerting labels.
  • Put human explanations in annotations.
  • Use recording rules for repeated or expensive PromQL.
  • Use alerting rules for notification-worthy conditions.
  • Do not put high-cardinality values in labels unless they are truly part of the series identity.
  • Be careful when overriding labels in alerting rules.

References


Thanks for reading! If you want to see future content, you can follow me on Twitter or get connected over at LinkedIn.


Support My Content

If you find my content helpful, consider supporting a humanitarian cause (building homes for elderly people in rural Terai region of Nepal) that I am planning with your donation:

Ethereum (ETH)

0xB62409A5B227D2aE7D8C66fdaA5EEf4eB4E37959

Thank you for your support!