Prometheus Labels, Alerting Labels, Recording Rules, and Alerting Rules
Prometheus has two different kinds of labels that people often mix together:
- PromQL labels, which live on metric time series.
- Alerting labels, which live on alert objects sent to Alertmanager.
They can share names, and alerting rules often copy metric labels into alert labels, but they are used for different jobs.
Open the interactive Prometheus labels and rules explorer
The Short Version
PromQL labels answer:
Which time series am I querying, filtering, grouping, or aggregating?
Alerting labels answer:
How should this alert be identified, grouped, routed, silenced, and deduplicated?
Recording rules answer:
What expensive or repeated PromQL result should I precompute as a new time series?
Alerting rules answer:
What condition should create a firing alert?
PromQL Labels
PromQL labels are part of the metric data model.
Example:
probe_success{
job="url-monitors",
monitor_id="mon_1001",
monitor_name="checkout_health",
team="checkout",
environment="prod",
instance="https://checkout.example.com/health"
}
These labels are stored with the time series. You use them to select, group, and aggregate metrics.
Select checkout monitors:
probe_success{team="checkout", environment="prod"}
Group availability by team:
avg by (team) (
probe_success{job="url-monitors"}
)
Find down monitors:
probe_success{job="url-monitors"} == 0
PromQL labels are powerful, but they are also the source of cardinality. Every unique label set creates a distinct time series.
Cardinality Example: One Metric Name, Many Series
This looks like one metric:
http_requests_total
But Prometheus does not store only the metric name. It stores each unique combination of labels as a separate time series.
Example series:
http_requests_total{service="checkout", method="GET", status="200"} 12000
http_requests_total{service="checkout", method="GET", status="500"} 42
http_requests_total{service="checkout", method="POST", status="200"} 8000
http_requests_total{service="billing", method="GET", status="200"} 9100
http_requests_total{service="billing", method="POST", status="500"} 11
Same metric name. Different label sets. Different time series.
If you have:
10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
= 720 active series
That can be fine.
But if you add an unbounded label like user_id:
10 services
x 4 HTTP methods
x 6 status groups
x 3 environments
x 1,000,000 users
= 720,000,000 active series
That is a Prometheus incident waiting to happen.
Good metric label:
http_requests_total{route="/users/:id"}
Bad metric label:
http_requests_total{route="/users/9427f3a0-6e6b-4c02-a96d-4bb42d9f9d31"}
Good metric label:
api_requests_total{status="500", team="checkout"}
Bad metric label:
api_requests_total{status="500", request_id="req_01J9V8H4F9ZQ9Y6T"}
Rule of thumb:
Use labels for bounded dimensions you aggregate by. Do not use labels for per-request, per-user, per-session, or per-event values.
Alerting Labels
Alerting labels are labels on the alert object.
Example alerting rule:
groups:
- name: url-monitor-alerts
rules:
- alert: UrlMonitorDown
expr: probe_success{job="url-monitors"} == 0
for: 3m
labels:
severity: page
team: "{{ $labels.team }}"
monitor_id: "{{ $labels.monitor_id }}"
annotations:
summary: "URL monitor failed: {{ $labels.monitor_name }}"
description: "{{ $labels.instance }} failed blackbox probes for 3 minutes."
When the alert fires, Alertmanager sees an alert object roughly like this:
{
"labels": {
"alertname": "UrlMonitorDown",
"severity": "page",
"team": "checkout",
"monitor_id": "mon_1001",
"job": "url-monitors",
"instance": "https://checkout.example.com/health"
},
"annotations": {
"summary": "URL monitor failed: checkout_health",
"description": "https://checkout.example.com/health failed blackbox probes for 3 minutes."
}
}
Alert labels are used by Alertmanager for:
- grouping
- deduplication
- routing
- silencing
- inhibition
- notification templates
How Metric Labels Become Alert Labels
When an alert expression returns a series, Prometheus starts with the labels from that matching series.
Then the rule’s labels: block adds or overrides labels.
Example metric series:
probe_success{
job="url-monitors",
team="checkout",
monitor_id="mon_1001",
instance="https://checkout.example.com/health"
} 0
Rule labels:
labels:
severity: page
team: "{{ $labels.team }}"
Final alert labels:
alertname="UrlMonitorDown"
job="url-monitors"
team="checkout"
monitor_id="mon_1001"
instance="https://checkout.example.com/health"
severity="page"
The danger is overriding identity labels accidentally. If every alert gets the same labels, Alertmanager may group or deduplicate them in a way you did not intend.
Labels vs Annotations
Use labels for machine decisions.
Good alert labels:
alertnameseverityteamserviceclusterenvironmentmonitor_id
Use annotations for human context.
Good annotations:
summarydescriptionrunbook_urldashboard_urllogs_url
Labels should be stable. Annotations can be descriptive.
Bad label:
labels:
error_message: "{{ $value }}"
That can create unstable alert identity.
Better annotation:
annotations:
description: "Current value is {{ $value }}"
Recording Rules
Recording rules run PromQL and save the result as a new time series.
Use recording rules when:
- a query is expensive
- many dashboards repeat the same query
- many alerts use the same base expression
- you want a stable, named metric for a derived signal
- you need to aggregate raw series before querying them often
Example:
groups:
- name: service-recording-rules
interval: 30s
rules:
- record: service:http_requests:rate5m
expr: |
sum by (service, status, team, environment) (
rate(http_requests_total[5m])
)
This creates a new metric:
service:http_requests:rate5m
Now dashboards and alerts can use:
sum by (service) (
service:http_requests:rate5m{status=~"5.."}
)
instead of recalculating the raw rate() every time.
Alerting Rules
Alerting rules run PromQL and create alerts when the expression is true.
Use alerting rules when:
- humans or automation need to be notified
- the condition has operational meaning
- the signal should route by team/severity
- you need
for:pending behavior - Alertmanager should group, silence, inhibit, or notify
Example:
groups:
- name: service-alerts
rules:
- alert: HighErrorRate
expr: |
sum by (service, team, environment) (
service:http_requests:rate5m{status=~"5.."}
)
/
sum by (service, team, environment) (
service:http_requests:rate5m
)
> 0.02
for: 10m
labels:
severity: page
team: "{{ $labels.team }}"
service: "{{ $labels.service }}"
environment: "{{ $labels.environment }}"
annotations:
summary: "High error rate for {{ $labels.service }}"
description: "5xx rate has been above 2% for 10 minutes."
Recording Rules vs Alerting Rules
| Topic | Recording rule | Alerting rule |
|---|---|---|
| Output | New time series | Alert object |
| Stored in TSDB | Yes | No, alert state is tracked separately |
| Sent to Alertmanager | No | Yes, when firing |
| Main purpose | Precompute reusable PromQL | Notify on conditions |
| Uses labels for | Result series dimensions | Routing, grouping, identity |
| Typical examples | request rate, p95 latency, SLO burn base metrics | high error rate, URL down, disk full |
Use both together:
raw metrics
-> recording rule creates clean derived metric
-> alerting rule evaluates derived metric
-> Alertmanager routes alert
Use Case: URL Monitor
Metric labels:
probe_success{
job="url-monitors",
monitor_id="mon_1001",
monitor_name="checkout_health",
team="checkout",
environment="prod",
instance="https://checkout.example.com/health"
}
Recording rule:
groups:
- name: url-monitor-recording-rules
rules:
- record: monitor:probe_success:avg5m
expr: |
avg_over_time(probe_success{job="url-monitors"}[5m])
Alerting rule:
groups:
- name: url-monitor-alerts
rules:
- alert: UrlMonitorDown
expr: monitor:probe_success:avg5m < 1
for: 3m
labels:
severity: page
team: "{{ $labels.team }}"
monitor_id: "{{ $labels.monitor_id }}"
annotations:
summary: "URL monitor failed: {{ $labels.monitor_name }}"
Alertmanager route:
route:
group_by: ["team", "alertname", "environment"]
routes:
- matchers:
- team="checkout"
- severity="page"
receiver: checkout-pagerduty
Practical Rules
- Put query dimensions in PromQL labels.
- Put routing dimensions in alerting labels.
- Put human explanations in annotations.
- Use recording rules for repeated or expensive PromQL.
- Use alerting rules for notification-worthy conditions.
- Do not put high-cardinality values in labels unless they are truly part of the series identity.
- Be careful when overriding labels in alerting rules.