Jan 01, 2026

Prometheus Rules vs Mimir Ruler: Where Should Alerting Logic Live?

Prometheus rules and Mimir Ruler both evaluate PromQL. That is why they are easy to confuse.

The important difference is not the syntax. The important difference is where the rule runs, what data it can see, where recording rule output is written, and how close the alert is to the failure domain.

Open the interactive Prometheus rules vs Mimir Ruler explorer

The Short Version

Use Prometheus rules when the alert should live close to the scraper:

  • scrape health
  • local cluster safety alerts
  • Blackbox URL, TCP port, and ICMP ping checks scraped by that Prometheus
  • alerts that must still work if the central metrics backend is unavailable
  • recording rules used by local dashboards or local alerts

Use Mimir Ruler when the rule needs central metrics:

  • global service SLOs across clusters
  • long-retention PromQL
  • tenant-wide dashboards and alerts
  • rules managed centrally instead of copied into every Prometheus
  • recording rules that should be written back into Mimir for shared use

Use Alertmanager for notification routing in both cases.

Prometheus alerting rule
  -> evaluates local PromQL
  -> creates firing alert objects
  -> sends them to Alertmanager
  -> Alertmanager groups, deduplicates, silences, inhibits, and routes

Mimir Ruler alerting rule
  -> evaluates PromQL against Mimir tenant data
  -> creates firing alert objects
  -> sends them to Alertmanager
  -> Alertmanager groups, deduplicates, silences, inhibits, and routes

The Core Mental Model

Prometheus is both a scraper and a rule engine.

Mimir is a central metrics backend. The Mimir Ruler is a rule engine that runs near that backend.

That means:

Prometheus can scrape targets.
Mimir Ruler does not scrape targets.

Prometheus can evaluate rules from its local TSDB.
Mimir Ruler evaluates rules from metrics already stored in Mimir.

Prometheus recording rules write new series into Prometheus.
Mimir Ruler recording rules write new series back into Mimir.

This matters most for synthetics.

If you want to monitor a URL, a TCP port, or ping reachability, something still needs to run the probe. Usually that is Prometheus scraping Blackbox Exporter, or Alloy running a Prometheus-compatible scrape pipeline. After that, the resulting probe_* metrics can stay local in Prometheus, or be remote-written into Mimir.

Prometheus Rules

Prometheus supports two rule types:

  • Recording rules, which precompute PromQL and store the result as new time series.
  • Alerting rules, which evaluate PromQL and create alert objects when a condition is true.

Rules are loaded through rule_files.

global:
  scrape_interval: 30s
  evaluation_interval: 30s

rule_files:
  - /etc/prometheus/rules/*.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager.monitoring.svc:9093

The key point: Prometheus rules see the data in that Prometheus server.

If the Prometheus scrapes Blackbox Exporter, then this local rule works:

groups:
  - name: synthetics
    rules:
      - alert: SyntheticTargetDown
        expr: probe_success{job="synthetics"} == 0
        for: 2m
        labels:
          severity: page
          team: platform
        annotations:
          summary: "Synthetic target is down"
          description: "{{ $labels.instance }} failed {{ $labels.module }}"

Prometheus evaluates the expression every evaluation_interval. If the expression stays true for the for duration, the alert becomes firing and Prometheus sends it to Alertmanager.

Mimir Ruler

Mimir Ruler also evaluates recording and alerting rules, but it evaluates them against Mimir data.

The Mimir path usually looks like this:

Prometheus or Alloy scrapes targets
  -> remote_write to Mimir
  -> Mimir stores tenant metrics
  -> Mimir Ruler queries Mimir
  -> Mimir Ruler sends alerts to Alertmanager

Example Mimir Ruler alert:

groups:
  - name: global-synthetics
    interval: 30s
    rules:
      - alert: GlobalSyntheticTargetDown
        expr: probe_success{job="synthetics", environment="prod"} == 0
        for: 2m
        labels:
          severity: page
          team: platform
          source: mimir-ruler
        annotations:
          summary: "Global synthetic target is down"
          description: "{{ $labels.instance }} failed from {{ $labels.cluster }}"

This is useful when probes from many clusters are remote-written into the same tenant.

Example:

min by (monitor_id, instance, team) (
  probe_success{job="synthetics", environment="prod"}
) == 0

That expression can detect that at least one vantage point is failing.

Or:

avg by (monitor_id, instance, team) (
  probe_success{job="synthetics", environment="prod"}
) < 0.8

That expression can alert when fewer than 80 percent of probe locations are succeeding.

The Decision Boundary

Use Prometheus Rules For Local Truth

Prometheus rules are the right default when the rule depends on local scrape state.

Examples:

  • Is this Prometheus failing to scrape a target?
  • Is this cluster's node exporter down?
  • Is this cluster's Blackbox Exporter returning probe_success == 0?
  • Is local Alertmanager reachable?
  • Is the local remote_write queue falling behind?

Prometheus rules are also good for resilience. If Mimir is down, local Prometheus alerting can still fire for local problems.

Use Mimir Ruler For Global Truth

Mimir Ruler is the right default when the question crosses Prometheus boundaries.

Examples:

  • Is checkout availability below the SLO across all clusters?
  • Did every region start failing the same URL monitor?
  • Is the global error budget burn rate too high?
  • Do we need one centrally managed rule instead of copying the same YAML into many Prometheus servers?
  • Do dashboards need shared recording rule output from Mimir?

Mimir Ruler is especially useful for multi-cluster, multi-region, and multi-tenant metrics.

Synthetic Monitoring With Blackbox Exporter

Blackbox Exporter exposes probe metrics. Prometheus does the scrape. The probe target is passed through relabeling.

The common metrics:

probe_success
probe_duration_seconds
probe_http_status_code
probe_ssl_earliest_cert_expiry
probe_dns_lookup_time_seconds

The common modules:

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      valid_status_codes: [200, 204, 301, 302]

  tcp_connect:
    prober: tcp
    timeout: 5s

  icmp:
    prober: icmp
    timeout: 5s

URL Monitor Example

File service discovery target:

[
  {
    "targets": ["https://bigyandahal.com"],
    "labels": {
      "monitor_id": "url_001",
      "monitor_name": "homepage",
      "module": "http_2xx",
      "team": "platform",
      "environment": "prod"
    }
  }
]

Prometheus scrape config:

scrape_configs:
  - job_name: synthetics
    metrics_path: /probe
    file_sd_configs:
      - files:
          - /etc/prometheus/file_sd/synthetics.json
        refresh_interval: 30s
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [module]
        target_label: __param_module
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter.monitoring.svc:9115

PromQL:

probe_success{job="synthetics", module="http_2xx"} == 0

TCP Port Monitor Example

Target:

[
  {
    "targets": ["postgres.prod.internal:5432"],
    "labels": {
      "monitor_id": "tcp_001",
      "monitor_name": "postgres_port",
      "module": "tcp_connect",
      "team": "database",
      "environment": "prod"
    }
  }
]

PromQL:

probe_success{job="synthetics", module="tcp_connect"} == 0

Port monitor alert:

- alert: TcpPortDown
  expr: probe_success{job="synthetics", module="tcp_connect"} == 0
  for: 1m
  labels:
    severity: page
    team: database
  annotations:
    summary: "TCP port check failed"
    description: "{{ $labels.instance }} is not accepting TCP connections"

ICMP Ping Monitor Example

Target:

[
  {
    "targets": ["10.10.20.12"],
    "labels": {
      "monitor_id": "ping_001",
      "monitor_name": "branch_gateway",
      "module": "icmp",
      "team": "network",
      "environment": "prod"
    }
  }
]

PromQL:

probe_success{job="synthetics", module="icmp"} == 0

Ping alert:

- alert: PingTargetDown
  expr: probe_success{job="synthetics", module="icmp"} == 0
  for: 3m
  labels:
    severity: warning
    team: network
  annotations:
    summary: "Ping target unreachable"
    description: "{{ $labels.instance }} is not reachable by ICMP"

Note: ICMP probing may require extra container permissions depending on how Blackbox Exporter is deployed.

How Alertmanager Is Invoked

Alertmanager does not run PromQL. It receives alert objects from rule engines.

Prometheus invokes Alertmanager through this config:

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager.monitoring.svc:9093

Mimir Ruler invokes Alertmanager through its ruler configuration:

ruler:
  alertmanager_url: http://alertmanager.monitoring.svc:9093

When a rule fires, the rule engine sends an alert object like this:

{
  "labels": {
    "alertname": "SyntheticTargetDown",
    "severity": "page",
    "team": "platform",
    "monitor_id": "url_001",
    "module": "http_2xx",
    "instance": "https://bigyandahal.com",
    "source": "prometheus"
  },
  "annotations": {
    "summary": "Synthetic target is down",
    "description": "https://bigyandahal.com failed http_2xx"
  },
  "startsAt": "2026-06-01T10:00:00Z",
  "generatorURL": "http://prometheus:9090/graph?g0.expr=probe_success..."
}

Alertmanager then:

  1. Groups related alerts.
  2. Deduplicates repeated alerts.
  3. Applies silences.
  4. Applies inhibition rules.
  5. Chooses a receiver.
  6. Sends notifications.

Example Alertmanager route:

route:
  group_by: ["team", "alertname", "environment"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: default-slack
  routes:
    - matchers:
        - team="database"
        - severity="page"
      receiver: database-pagerduty
    - matchers:
        - team="network"
      receiver: network-slack
    - matchers:
        - team="platform"
        - severity="page"
      receiver: platform-pagerduty

receivers:
  - name: default-slack
    slack_configs:
      - channel: "#alerts"
  - name: database-pagerduty
    pagerduty_configs:
      - routing_key_file: /etc/alertmanager/pagerduty-database
  - name: network-slack
    slack_configs:
      - channel: "#network-alerts"
  - name: platform-pagerduty
    pagerduty_configs:
      - routing_key_file: /etc/alertmanager/pagerduty-platform

Recording Rules: Prometheus vs Mimir

Recording rules create new time series.

Prometheus recording rule:

groups:
  - name: synthetic-recordings
    interval: 30s
    rules:
      - record: monitor:probe_success:ratio5m
        expr: avg_over_time(probe_success{job="synthetics"}[5m])

The result is stored in Prometheus:

monitor:probe_success:ratio5m{monitor_id="url_001", team="platform"} 0.98

Mimir Ruler recording rule:

groups:
  - name: global-synthetic-recordings
    interval: 30s
    rules:
      - record: tenant:monitor:probe_success:ratio5m
        expr: |
          avg by (monitor_id, team, environment) (
            avg_over_time(probe_success{job="synthetics"}[5m])
          )

The result is written back into Mimir. That makes it available to Grafana dashboards and other Mimir Ruler rules across the tenant.

Recommended Architecture

For production, do not choose only one rule engine. Split responsibilities.

Use Prometheus rules for:

  • Blackbox exporter scrape failure
  • up == 0
  • local node and pod health
  • local cluster saturation
  • remote_write queue health
  • "central metrics backend is unreachable" alerts

Use Mimir Ruler for:

  • global SLO burn alerts
  • tenant-wide synthetic success ratios
  • cross-cluster service availability
  • long-retention comparisons
  • central recording rules used by shared dashboards

Use Alertmanager for:

  • one notification policy
  • one set of silences
  • deduplication across Prometheus and Mimir Ruler
  • routing by team, severity, environment, and service

Common Mistakes

Mistake 1: Expecting Mimir Ruler to Scrape

Mimir Ruler evaluates rules. It does not run Blackbox probes.

You still need Prometheus, Alloy, or another Prometheus-compatible scraper to produce probe_success.

Mistake 2: Sending Every Local Alert to Mimir Ruler

If the alert is about a local scrape target, keep it close to Prometheus.

Example:

up{job="kubelet"} == 0

That should usually be a Prometheus alert. If your Mimir path is broken, you still want to know the kubelet scrape is failing.

Mistake 3: Routing on Unstable Labels

Do not route Alertmanager notifications by volatile labels like URL path, pod UID, container ID, request ID, or raw instance names that churn constantly.

Route by stable ownership labels:

labels:
  team: platform
  service: checkout
  severity: page
  environment: prod

Mistake 4: Duplicating the Same Alert From Both Engines

If Prometheus and Mimir Ruler both fire the same alert, Alertmanager can deduplicate only when the relevant labels match. Usually, the cleaner design is:

  • Prometheus fires local target alerts.
  • Mimir Ruler fires global aggregate alerts.

Practical Rule Naming

For recording rules, encode aggregation and window in the name:

- record: job:probe_success:ratio5m
  expr: avg by (job) (avg_over_time(probe_success[5m]))

- record: team:probe_success:ratio30m
  expr: avg by (team) (avg_over_time(probe_success[30m]))

For alerting rules, make the alert name describe the symptom:

- alert: SyntheticTargetDown
- alert: TcpPortDown
- alert: PingTargetDown
- alert: GlobalSyntheticAvailabilityLow
- alert: MimirRulerEvaluationFailing

Final Rule of Thumb

Ask four questions:

1. Where is the metric produced?
2. Where is the metric stored?
3. Is the rule local or global?
4. Should the alert survive central backend failure?

Then choose:

QuestionBetter fit
Local scrape failure?Prometheus rules
Blackbox probe from one cluster?Prometheus rules
Synthetic SLO across regions?Mimir Ruler
Shared tenant recording rule?Mimir Ruler
Central long-retention PromQL?Mimir Ruler
Cluster safety alert?Prometheus rules
Notification routing?Alertmanager

The clean architecture is not "Prometheus rules or Mimir Ruler." It is:

Prometheus rules for local truth.
Mimir Ruler for global truth.
Alertmanager for notification truth.

References


Thanks for reading! If you want to see future content, you can follow me on Twitter or get connected over at LinkedIn.


Support My Content

If you find my content helpful, consider supporting a humanitarian cause (building homes for elderly people in rural Terai region of Nepal) that I am planning with your donation:

Ethereum (ETH)

0xB62409A5B227D2aE7D8C66fdaA5EEf4eB4E37959

Thank you for your support!