Prometheus Rules vs Mimir Ruler: Where Should Alerting Logic Live?
Prometheus rules and Mimir Ruler both evaluate PromQL. That is why they are easy to confuse.
The important difference is not the syntax. The important difference is where the rule runs, what data it can see, where recording rule output is written, and how close the alert is to the failure domain.
Open the interactive Prometheus rules vs Mimir Ruler explorer
The Short Version
Use Prometheus rules when the alert should live close to the scraper:
- scrape health
- local cluster safety alerts
- Blackbox URL, TCP port, and ICMP ping checks scraped by that Prometheus
- alerts that must still work if the central metrics backend is unavailable
- recording rules used by local dashboards or local alerts
Use Mimir Ruler when the rule needs central metrics:
- global service SLOs across clusters
- long-retention PromQL
- tenant-wide dashboards and alerts
- rules managed centrally instead of copied into every Prometheus
- recording rules that should be written back into Mimir for shared use
Use Alertmanager for notification routing in both cases.
Prometheus alerting rule
-> evaluates local PromQL
-> creates firing alert objects
-> sends them to Alertmanager
-> Alertmanager groups, deduplicates, silences, inhibits, and routes
Mimir Ruler alerting rule
-> evaluates PromQL against Mimir tenant data
-> creates firing alert objects
-> sends them to Alertmanager
-> Alertmanager groups, deduplicates, silences, inhibits, and routes
The Core Mental Model
Prometheus is both a scraper and a rule engine.
Mimir is a central metrics backend. The Mimir Ruler is a rule engine that runs near that backend.
That means:
Prometheus can scrape targets.
Mimir Ruler does not scrape targets.
Prometheus can evaluate rules from its local TSDB.
Mimir Ruler evaluates rules from metrics already stored in Mimir.
Prometheus recording rules write new series into Prometheus.
Mimir Ruler recording rules write new series back into Mimir.
This matters most for synthetics.
If you want to monitor a URL, a TCP port, or ping reachability, something still needs to run the probe. Usually that is Prometheus scraping Blackbox Exporter, or Alloy running a Prometheus-compatible scrape pipeline. After that, the resulting probe_* metrics can stay local in Prometheus, or be remote-written into Mimir.
Prometheus Rules
Prometheus supports two rule types:
- Recording rules, which precompute PromQL and store the result as new time series.
- Alerting rules, which evaluate PromQL and create alert objects when a condition is true.
Rules are loaded through rule_files.
global:
scrape_interval: 30s
evaluation_interval: 30s
rule_files:
- /etc/prometheus/rules/*.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager.monitoring.svc:9093
The key point: Prometheus rules see the data in that Prometheus server.
If the Prometheus scrapes Blackbox Exporter, then this local rule works:
groups:
- name: synthetics
rules:
- alert: SyntheticTargetDown
expr: probe_success{job="synthetics"} == 0
for: 2m
labels:
severity: page
team: platform
annotations:
summary: "Synthetic target is down"
description: "{{ $labels.instance }} failed {{ $labels.module }}"
Prometheus evaluates the expression every evaluation_interval. If the expression stays true for the for duration, the alert becomes firing and Prometheus sends it to Alertmanager.
Mimir Ruler
Mimir Ruler also evaluates recording and alerting rules, but it evaluates them against Mimir data.
The Mimir path usually looks like this:
Prometheus or Alloy scrapes targets
-> remote_write to Mimir
-> Mimir stores tenant metrics
-> Mimir Ruler queries Mimir
-> Mimir Ruler sends alerts to Alertmanager
Example Mimir Ruler alert:
groups:
- name: global-synthetics
interval: 30s
rules:
- alert: GlobalSyntheticTargetDown
expr: probe_success{job="synthetics", environment="prod"} == 0
for: 2m
labels:
severity: page
team: platform
source: mimir-ruler
annotations:
summary: "Global synthetic target is down"
description: "{{ $labels.instance }} failed from {{ $labels.cluster }}"
This is useful when probes from many clusters are remote-written into the same tenant.
Example:
min by (monitor_id, instance, team) (
probe_success{job="synthetics", environment="prod"}
) == 0
That expression can detect that at least one vantage point is failing.
Or:
avg by (monitor_id, instance, team) (
probe_success{job="synthetics", environment="prod"}
) < 0.8
That expression can alert when fewer than 80 percent of probe locations are succeeding.
The Decision Boundary
Use Prometheus Rules For Local Truth
Prometheus rules are the right default when the rule depends on local scrape state.
Examples:
- Is this Prometheus failing to scrape a target?
- Is this cluster's node exporter down?
- Is this cluster's Blackbox Exporter returning
probe_success == 0? - Is local Alertmanager reachable?
- Is the local remote_write queue falling behind?
Prometheus rules are also good for resilience. If Mimir is down, local Prometheus alerting can still fire for local problems.
Use Mimir Ruler For Global Truth
Mimir Ruler is the right default when the question crosses Prometheus boundaries.
Examples:
- Is checkout availability below the SLO across all clusters?
- Did every region start failing the same URL monitor?
- Is the global error budget burn rate too high?
- Do we need one centrally managed rule instead of copying the same YAML into many Prometheus servers?
- Do dashboards need shared recording rule output from Mimir?
Mimir Ruler is especially useful for multi-cluster, multi-region, and multi-tenant metrics.
Synthetic Monitoring With Blackbox Exporter
Blackbox Exporter exposes probe metrics. Prometheus does the scrape. The probe target is passed through relabeling.
The common metrics:
probe_success
probe_duration_seconds
probe_http_status_code
probe_ssl_earliest_cert_expiry
probe_dns_lookup_time_seconds
The common modules:
modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
valid_status_codes: [200, 204, 301, 302]
tcp_connect:
prober: tcp
timeout: 5s
icmp:
prober: icmp
timeout: 5s
URL Monitor Example
File service discovery target:
[
{
"targets": ["https://bigyandahal.com"],
"labels": {
"monitor_id": "url_001",
"monitor_name": "homepage",
"module": "http_2xx",
"team": "platform",
"environment": "prod"
}
}
]
Prometheus scrape config:
scrape_configs:
- job_name: synthetics
metrics_path: /probe
file_sd_configs:
- files:
- /etc/prometheus/file_sd/synthetics.json
refresh_interval: 30s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [module]
target_label: __param_module
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter.monitoring.svc:9115
PromQL:
probe_success{job="synthetics", module="http_2xx"} == 0
TCP Port Monitor Example
Target:
[
{
"targets": ["postgres.prod.internal:5432"],
"labels": {
"monitor_id": "tcp_001",
"monitor_name": "postgres_port",
"module": "tcp_connect",
"team": "database",
"environment": "prod"
}
}
]
PromQL:
probe_success{job="synthetics", module="tcp_connect"} == 0
Port monitor alert:
- alert: TcpPortDown
expr: probe_success{job="synthetics", module="tcp_connect"} == 0
for: 1m
labels:
severity: page
team: database
annotations:
summary: "TCP port check failed"
description: "{{ $labels.instance }} is not accepting TCP connections"
ICMP Ping Monitor Example
Target:
[
{
"targets": ["10.10.20.12"],
"labels": {
"monitor_id": "ping_001",
"monitor_name": "branch_gateway",
"module": "icmp",
"team": "network",
"environment": "prod"
}
}
]
PromQL:
probe_success{job="synthetics", module="icmp"} == 0
Ping alert:
- alert: PingTargetDown
expr: probe_success{job="synthetics", module="icmp"} == 0
for: 3m
labels:
severity: warning
team: network
annotations:
summary: "Ping target unreachable"
description: "{{ $labels.instance }} is not reachable by ICMP"
Note: ICMP probing may require extra container permissions depending on how Blackbox Exporter is deployed.
How Alertmanager Is Invoked
Alertmanager does not run PromQL. It receives alert objects from rule engines.
Prometheus invokes Alertmanager through this config:
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager.monitoring.svc:9093
Mimir Ruler invokes Alertmanager through its ruler configuration:
ruler:
alertmanager_url: http://alertmanager.monitoring.svc:9093
When a rule fires, the rule engine sends an alert object like this:
{
"labels": {
"alertname": "SyntheticTargetDown",
"severity": "page",
"team": "platform",
"monitor_id": "url_001",
"module": "http_2xx",
"instance": "https://bigyandahal.com",
"source": "prometheus"
},
"annotations": {
"summary": "Synthetic target is down",
"description": "https://bigyandahal.com failed http_2xx"
},
"startsAt": "2026-06-01T10:00:00Z",
"generatorURL": "http://prometheus:9090/graph?g0.expr=probe_success..."
}
Alertmanager then:
- Groups related alerts.
- Deduplicates repeated alerts.
- Applies silences.
- Applies inhibition rules.
- Chooses a receiver.
- Sends notifications.
Example Alertmanager route:
route:
group_by: ["team", "alertname", "environment"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: default-slack
routes:
- matchers:
- team="database"
- severity="page"
receiver: database-pagerduty
- matchers:
- team="network"
receiver: network-slack
- matchers:
- team="platform"
- severity="page"
receiver: platform-pagerduty
receivers:
- name: default-slack
slack_configs:
- channel: "#alerts"
- name: database-pagerduty
pagerduty_configs:
- routing_key_file: /etc/alertmanager/pagerduty-database
- name: network-slack
slack_configs:
- channel: "#network-alerts"
- name: platform-pagerduty
pagerduty_configs:
- routing_key_file: /etc/alertmanager/pagerduty-platform
Recording Rules: Prometheus vs Mimir
Recording rules create new time series.
Prometheus recording rule:
groups:
- name: synthetic-recordings
interval: 30s
rules:
- record: monitor:probe_success:ratio5m
expr: avg_over_time(probe_success{job="synthetics"}[5m])
The result is stored in Prometheus:
monitor:probe_success:ratio5m{monitor_id="url_001", team="platform"} 0.98
Mimir Ruler recording rule:
groups:
- name: global-synthetic-recordings
interval: 30s
rules:
- record: tenant:monitor:probe_success:ratio5m
expr: |
avg by (monitor_id, team, environment) (
avg_over_time(probe_success{job="synthetics"}[5m])
)
The result is written back into Mimir. That makes it available to Grafana dashboards and other Mimir Ruler rules across the tenant.
Recommended Architecture
For production, do not choose only one rule engine. Split responsibilities.
Use Prometheus rules for:
- Blackbox exporter scrape failure
up == 0- local node and pod health
- local cluster saturation
- remote_write queue health
- "central metrics backend is unreachable" alerts
Use Mimir Ruler for:
- global SLO burn alerts
- tenant-wide synthetic success ratios
- cross-cluster service availability
- long-retention comparisons
- central recording rules used by shared dashboards
Use Alertmanager for:
- one notification policy
- one set of silences
- deduplication across Prometheus and Mimir Ruler
- routing by
team,severity,environment, andservice
Common Mistakes
Mistake 1: Expecting Mimir Ruler to Scrape
Mimir Ruler evaluates rules. It does not run Blackbox probes.
You still need Prometheus, Alloy, or another Prometheus-compatible scraper to produce probe_success.
Mistake 2: Sending Every Local Alert to Mimir Ruler
If the alert is about a local scrape target, keep it close to Prometheus.
Example:
up{job="kubelet"} == 0
That should usually be a Prometheus alert. If your Mimir path is broken, you still want to know the kubelet scrape is failing.
Mistake 3: Routing on Unstable Labels
Do not route Alertmanager notifications by volatile labels like URL path, pod UID, container ID, request ID, or raw instance names that churn constantly.
Route by stable ownership labels:
labels:
team: platform
service: checkout
severity: page
environment: prod
Mistake 4: Duplicating the Same Alert From Both Engines
If Prometheus and Mimir Ruler both fire the same alert, Alertmanager can deduplicate only when the relevant labels match. Usually, the cleaner design is:
- Prometheus fires local target alerts.
- Mimir Ruler fires global aggregate alerts.
Practical Rule Naming
For recording rules, encode aggregation and window in the name:
- record: job:probe_success:ratio5m
expr: avg by (job) (avg_over_time(probe_success[5m]))
- record: team:probe_success:ratio30m
expr: avg by (team) (avg_over_time(probe_success[30m]))
For alerting rules, make the alert name describe the symptom:
- alert: SyntheticTargetDown
- alert: TcpPortDown
- alert: PingTargetDown
- alert: GlobalSyntheticAvailabilityLow
- alert: MimirRulerEvaluationFailing
Final Rule of Thumb
Ask four questions:
1. Where is the metric produced?
2. Where is the metric stored?
3. Is the rule local or global?
4. Should the alert survive central backend failure?
Then choose:
| Question | Better fit |
|---|---|
| Local scrape failure? | Prometheus rules |
| Blackbox probe from one cluster? | Prometheus rules |
| Synthetic SLO across regions? | Mimir Ruler |
| Shared tenant recording rule? | Mimir Ruler |
| Central long-retention PromQL? | Mimir Ruler |
| Cluster safety alert? | Prometheus rules |
| Notification routing? | Alertmanager |
The clean architecture is not "Prometheus rules or Mimir Ruler." It is:
Prometheus rules for local truth.
Mimir Ruler for global truth.
Alertmanager for notification truth.