Dec 15, 2025

Prometheus Labels, Alerting Labels, Recording Rules, and Alerting Rules

Prometheus has two different kinds of labels that people often mix together:

PromQL labels, which live on metric time series.
Alerting labels, which live on alert objects sent to Alertmanager.

They can share names, and alerting rules often copy metric labels into alert labels, but they are used for different jobs.

Open the interactive Prometheus labels and rules explorer

The Short Version

PromQL labels answer:

Which time series am I querying, filtering, grouping, or aggregating?

Alerting labels answer:

How should this alert be identified, grouped, routed, silenced, and deduplicated?

Recording rules answer:

What expensive or repeated PromQL result should I precompute as a new time series?

Alerting rules answer:

What condition should create a firing alert?

PromQL Labels

PromQL labels are part of the metric data model.

Example:

probe_success{
  job="url-monitors",
  monitor_id="mon_1001",
  monitor_name="checkout_health",
  team="checkout",
  environment="prod",
  instance="https://checkout.example.com/health"
}

These labels are stored with the time series. You use them to select, group, and aggregate metrics.

Select checkout monitors:

probe_success{team="checkout", environment="prod"}

Group availability by team:

avg by (team) (
  probe_success{job="url-monitors"}
)

Find down monitors:

probe_success{job="url-monitors"} == 0

PromQL labels are powerful, but they are also the source of cardinality. Every unique label set creates a distinct time series.

Cardinality Example: One Metric Name, Many Series

This looks like one metric:

http_requests_total

But Prometheus does not store only the metric name. It stores each unique combination of labels as a separate time series.

Example series:

http_requests_total{service="checkout", method="GET", status="200"} 12000
http_requests_total{service="checkout", method="GET", status="500"} 42
http_requests_total{service="checkout", method="POST", status="200"} 8000
http_requests_total{service="billing", method="GET", status="200"} 9100
http_requests_total{service="billing", method="POST", status="500"} 11

Same metric name. Different label sets. Different time series.

If you have:

10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
= 720 active series

That can be fine.

But if you add an unbounded label like user_id:

10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
x 1,000,000 users
= 720,000,000 active series

That is a Prometheus incident waiting to happen.

Good metric label:

http_requests_total{route="/users/:id"}

Bad metric label:

http_requests_total{route="/users/9427f3a0-6e6b-4c02-a96d-4bb42d9f9d31"}

Good metric label:

api_requests_total{status="500", team="checkout"}

Bad metric label:

api_requests_total{status="500", request_id="req_01J9V8H4F9ZQ9Y6T"}

Rule of thumb:

Use labels for bounded dimensions you aggregate by. Do not use labels for per-request, per-user, per-session, or per-event values.

Alerting Labels

Alerting labels are labels on the alert object.

Example alerting rule:

groups:
  - name: url-monitor-alerts
    rules:
      - alert: UrlMonitorDown
        expr: probe_success{job="url-monitors"} == 0
        for: 3m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          monitor_id: "{{ $labels.monitor_id }}"
        annotations:
          summary: "URL monitor failed: {{ $labels.monitor_name }}"
          description: "{{ $labels.instance }} failed blackbox probes for 3 minutes."

When the alert fires, Alertmanager sees an alert object roughly like this:

{
  "labels": {
    "alertname": "UrlMonitorDown",
    "severity": "page",
    "team": "checkout",
    "monitor_id": "mon_1001",
    "job": "url-monitors",
    "instance": "https://checkout.example.com/health"
  },
  "annotations": {
    "summary": "URL monitor failed: checkout_health",
    "description": "https://checkout.example.com/health failed blackbox probes for 3 minutes."
  }
}

Alert labels are used by Alertmanager for:

grouping
deduplication
routing
silencing
inhibition
notification templates

How Metric Labels Become Alert Labels

When an alert expression returns a series, Prometheus starts with the labels from that matching series.

Then the rule’s labels: block adds or overrides labels.

Example metric series:

probe_success{
  job="url-monitors",
  team="checkout",
  monitor_id="mon_1001",
  instance="https://checkout.example.com/health"
} 0

Rule labels:

labels:
  severity: page
  team: "{{ $labels.team }}"

Final alert labels:

alertname="UrlMonitorDown"
job="url-monitors"
team="checkout"
monitor_id="mon_1001"
instance="https://checkout.example.com/health"
severity="page"

The danger is overriding identity labels accidentally. If every alert gets the same labels, Alertmanager may group or deduplicate them in a way you did not intend.

Labels vs Annotations

Use labels for machine decisions.

Good alert labels:

alertname
severity
team
service
cluster
environment
monitor_id

Use annotations for human context.

Good annotations:

summary
description
runbook_url
dashboard_url
logs_url

Labels should be stable. Annotations can be descriptive.

Bad label:

labels:
  error_message: "{{ $value }}"

That can create unstable alert identity.

Better annotation:

annotations:
  description: "Current value is {{ $value }}"

Recording Rules

Recording rules run PromQL and save the result as a new time series.

Use recording rules when:

a query is expensive
many dashboards repeat the same query
many alerts use the same base expression
you want a stable, named metric for a derived signal
you need to aggregate raw series before querying them often

Example:

groups:
  - name: service-recording-rules
    interval: 30s
    rules:
      - record: service:http_requests:rate5m
        expr: |
          sum by (service, status, team, environment) (
            rate(http_requests_total[5m])
          )

This creates a new metric:

service:http_requests:rate5m

Now dashboards and alerts can use:

sum by (service) (
  service:http_requests:rate5m{status=~"5.."}
)

instead of recalculating the raw rate() every time.

Alerting Rules

Alerting rules run PromQL and create alerts when the expression is true.

Use alerting rules when:

humans or automation need to be notified
the condition has operational meaning
the signal should route by team/severity
you need for: pending behavior
Alertmanager should group, silence, inhibit, or notify

Example:

groups:
  - name: service-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum by (service, team, environment) (
            service:http_requests:rate5m{status=~"5.."}
          )
          /
          sum by (service, team, environment) (
            service:http_requests:rate5m
          )
          > 0.02
        for: 10m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          service: "{{ $labels.service }}"
          environment: "{{ $labels.environment }}"
        annotations:
          summary: "High error rate for {{ $labels.service }}"
          description: "5xx rate has been above 2% for 10 minutes."

Recording Rules vs Alerting Rules

Topic	Recording rule	Alerting rule
Output	New time series	Alert object
Stored in TSDB	Yes	No, alert state is tracked separately
Sent to Alertmanager	No	Yes, when firing
Main purpose	Precompute reusable PromQL	Notify on conditions
Uses labels for	Result series dimensions	Routing, grouping, identity
Typical examples	request rate, p95 latency, SLO burn base metrics	high error rate, URL down, disk full

Use both together:

raw metrics
  -> recording rule creates clean derived metric
  -> alerting rule evaluates derived metric
  -> Alertmanager routes alert

Use Case: URL Monitor

Metric labels:

probe_success{
  job="url-monitors",
  monitor_id="mon_1001",
  monitor_name="checkout_health",
  team="checkout",
  environment="prod",
  instance="https://checkout.example.com/health"
}

Recording rule:

groups:
  - name: url-monitor-recording-rules
    rules:
      - record: monitor:probe_success:avg5m
        expr: |
          avg_over_time(probe_success{job="url-monitors"}[5m])

Alerting rule:

groups:
  - name: url-monitor-alerts
    rules:
      - alert: UrlMonitorDown
        expr: monitor:probe_success:avg5m < 1
        for: 3m
        labels:
          severity: page
          team: "{{ $labels.team }}"
          monitor_id: "{{ $labels.monitor_id }}"
        annotations:
          summary: "URL monitor failed: {{ $labels.monitor_name }}"

Alertmanager route:

route:
  group_by: ["team", "alertname", "environment"]
  routes:
    - matchers:
        - team="checkout"
        - severity="page"
      receiver: checkout-pagerduty

Practical Rules

Put query dimensions in PromQL labels.
Put routing dimensions in alerting labels.
Put human explanations in annotations.
Use recording rules for repeated or expensive PromQL.
Use alerting rules for notification-worthy conditions.
Do not put high-cardinality values in labels unless they are truly part of the series identity.
Be careful when overriding labels in alerting rules.