May 29, 2026

Mimir: Scalable Prometheus with Interactive Architecture

Grafana Mimir is a highly available, scalable, long-term storage for Prometheus. Think of it as "Prometheus on steroids" — distributed, fault-tolerant, and built for production scale.

Prometheus
cpu_usagehttp_requestslatency_p95memory_byteserror_rate
Grafana Mimir
Distributorvalidate + shard
Ingesterreplicate + buffer
Queriermerge + dedupe
Long-term storage
2h6h24h30d1y
Short-lived Prometheus samples become replicated Mimir writes, then compacted blocks that stay queryable for months or years.

Open the interactive Mimir architecture explorer

The Core Idea

Prometheus is great for short-term metrics, but Mimir solves the storage problem: how do you keep metrics for months/years without running out of disk space or memory?

Mimir = Prometheus metrics → distributed storage → queryable at scale

Three-Layer Architecture Model

Mimir's components cleanly separate into three concerns:

  1. Read Path — How queries happen
  2. Write Path — How metrics get in
  3. Backend Storage — Where everything lives

This separation is the key insight. Let me break it down.


Write Path Components (Ingestion)

These components handle incoming Prometheus metrics.

Distributor

What it does:

  • Receives metrics from Prometheus scrapers (or Prometheus clients)
  • Validates metrics format
  • Adds authentication/rate limits
  • Routes data to ingesters

Mental model: The bouncer at the club. Checks your ID, makes sure you're allowed in, then directs you to a table.

Key details:

  • Stateless (can scale horizontally)
  • Handles load balancing to ingesters
  • Enforces rate limits per tenant
  • Checksums data for integrity

Ingester

What it does:

  • Receives time-series data from distributors
  • Keeps recent data in memory (write buffer)
  • Periodically flushes to long-term storage
  • Handles data replication

Mental model: A server at a restaurant. Takes your order, writes it down, then hands it to the kitchen.

Key details:

  • Stateful (persistent across requests)
  • One ingester per shard
  • Maintains multiple replicas for fault tolerance
  • Holds ~2-3 hours of metrics in memory

Read Path Components (Querying)

These components handle PromQL queries.

Query Frontend

What it does:

  • Receives PromQL queries from users/Grafana
  • Splits large queries into smaller chunks
  • Caches query results
  • Returns results to users

Mental model: A receptionist. Takes your request, figures out who needs to handle it, caches common questions.

Key details:

  • Stateless (scales horizontally)
  • Implements query caching (reduces load)
  • Query-result caching (10-30 second typical TTL)
  • Useful for dashboard queries (same query, repeatedly)

Querier

What it does:

  • Executes PromQL against metrics
  • Reads from ingesters (recent data)
  • Reads from long-term storage (old data)
  • Aggregates results

Mental model: A librarian searching for books. Checks recent arrivals (ingesters) and the archive (storage).

Key details:

  • Stateless (scales horizontally)
  • Fetches data from multiple sources
  • Deduplicates & merges time-series
  • Handles failed queries gracefully

Backend Storage Components

These handle long-term persistence.

Blocks Storage

What it does:

  • Stores compressed metric blocks
  • One file per ~2-hour chunk
  • Organized by tenant & metric

Mental model: A filing cabinet. Organized, compressed, tamper-proof.

Key details:

  • 2-hour blocks typical
  • TSDB format (same as Prometheus)
  • Immutable (write-once)
  • Highly compressible

Object Storage (S3, GCS, Azure)

What it does:

  • Durably stores all blocks
  • Highly available & redundant
  • Multi-region capable

Mental model: Bank vault. Distributed, replicated, backed up.

Key details:

  • Cloud-native (S3, GCS, Azure Blob)
  • Or MinIO for on-prem
  • Survives machine failures
  • Can span regions

Compactor

What it does:

  • Merges small blocks into larger ones
  • Deduplicates time-series
  • Downsamples old data (5m→1h→24h)
  • Optimizes storage

Mental model: An archivist. Takes loose papers, merges them into binders, throws away outdated copies.

Key details:

  • Batch job (runs periodically)
  • Reduces query latency
  • Saves storage space
  • Downsampling: trade resolution for space

Index (Bloom/BoltDB)

What it does:

  • Fast label lookups
  • Bloom filters for cardinality
  • Quick series discovery

Mental model: A book's index. Find page numbers quickly without reading every page.

Key details:

  • Speeds up metric discovery
  • Prevents memory exhaustion
  • Handles high cardinality

Interactive Component Diagram

Here's how they connect:

Write Flow

Prometheus Scraper
        ↓
   Distributor (Stateless)
        ↓
   Ingester (Stateful, in-memory)
        ↓
   Object Storage (S3/GCS)

Read that as:

  1. Prometheus pushes metrics to distributor
  2. Distributor rate-limits, validates, routes
  3. Ingester buffers in memory
  4. Periodically flushed to cloud storage

Read Flow

User / Grafana
        ↓
Query Frontend (Caching)
        ↓
   Querier (Orchestrator)
        ↓
    /----├----\
Ingester  Blocks  Index
(recent)  (old)  (fast lookup)

Read that as:

  1. Query arrives at frontend
  2. Frontend checks cache
  3. Querier fetches from multiple sources
  4. Results merged & returned

Real Example: "Show CPU usage last 7 days"

Write Side (What happened):

  • Day 1: Prometheus sends metrics → Distributor → Ingester (in-memory)
  • Day 1-3: Ingester flushes blocks → S3 (after 3-4 hours)
  • Nightly: Compactor merges blocks (reduce from 100 → 20 blocks)

Read Side (Query happens now):

  1. User queries "CPU last 7 days"
  2. Query Frontend splits into:
    • "Last 2 hours" → check recent ingesters
    • "2-24 hours ago" → check blocks from S3
    • "1-7 days ago" → check downsampled blocks (1m data → 5m)
  3. Querier parallelizes reads
  4. Index helps find "cpu_usage" metric instantly
  5. Results stream back to user

Total latency: ~500ms - 2s (depending on query complexity)


Component Interaction Matrix

ComponentReadWriteStateless?Scales?
Distributor
Ingester⚠️ (stateful)
Query Frontend
Querier
Compactor (batch job)
Object StorageN/A (infinite)
Index

Write Path Deep Dive

Step 1: Metric arrives at Distributor

Metric:
  name: up
  labels:
    job: prometheus
    instance: localhost:9090
  timestamp: 1234567890
  value: 1

Distributor checks:

  • Format valid?
  • User authenticated?
  • Rate limit OK?
  • Tenant exists?

Then: Hash metric labels → pick 3 ingesters (replication)

Step 2: Ingester buffers

Ingester memory state:
┌─────────────────────────────┐
│ WAL (Write-Ahead Log)       │
├─────────────────────────────┤
│ Time-series in-memory db    │
│ up{job=prometheus}...       │
│ http_requests_total{...}... │
│ ...                         │
└─────────────────────────────┘

Key: WAL persists to disk (survives restarts)

Step 3: Periodic flush to storage

Ingester → Compresses → Blocks → Object Storage
  (3GB RAM)      (50MB)     (S3)

Compression: ~50:1 typical (time-series is repetitive)


Read Path Deep Dive

Step 1: Query arrives at Frontend

User writes: rate(up[5m])

Frontend:

  • Parses query
  • Checks label set (what metrics are needed?)
  • Splits by time range:
    • 0-2h ago → ingesters
    • 2h-1yr ago → blocks
    • 1yr+ ago → downsampled blocks

Step 2: Querier fetches from multiple sources

Querier parallel fetch:
├─ Ingester 1 (recent data)
├─ Ingester 2 (replication)
├─ Blocks reader (historical)
└─ Index (label lookup)

Deduplication: If data exists in both ingester & blocks, keep one copy.

Step 3: Results stream back

Results:
│ timestamp | value |
│-----------|-------|
│ 1234567890|  1    |
│ 1234567891|  1    |
│ 1234567892|  1    |

Scaling Mimir

Horizontal Scaling (add more machines)

ComponentScale StrategyNotes
DistributorAdd replicasStateless, easy
IngesterAdd shardsHash-based routing
Query FrontendAdd replicasStateless, cache-friendly
QuerierAdd replicasStateless
CompactorSingle jobOr distributed compaction
StorageInfiniteCloud storage scales automatically

Vertical Scaling (bigger machines)

  • Ingesters: More RAM = hold metrics longer = cheaper storage
  • Queriers: More CPU = faster PromQL evaluation
  • Distributors: More CPU = higher throughput

Tenancy (Multi-tenant Prometheus)

Mimir supports multiple independent Prometheus instances in one cluster.

Tenant Isolation

Distributor receives metric
├─ Extract tenant ID from request header
├─ Route to tenant-specific ingesters
├─ Store in tenant-specific blocks
└─ Query frontend filters by tenant

Example:

Team A: Prometheus → Distributor (tenant_id=team-a)
Team B: Prometheus → Distributor (tenant_id=team-b)

Same Mimir cluster, complete isolation.

Failure Scenarios & Recovery

Ingester Dies

  1. Distributor routes to healthy ingesters
  2. Lost data still in object storage (replicated)
  3. No data loss (because of replication factor)
  4. New ingester starts, re-syncs from storage

Distributor Dies

  1. Load balancer detects, routes to next distributor
  2. Completely stateless, no state recovery needed
  3. Query continues uninterrupted

Storage Goes Down

  1. Queries hitting blocks stall
  2. Recent ingesters still serve ~2h of data
  3. Ingesters keep writing to WAL (disk)
  4. When storage returns, ingesters resume flushing

Tuning for Your Use Case

High Cardinality (many unique metrics)

Tune:

  • Increase index size
  • Enable bloom filters
  • Reduce compaction interval
  • Increase ingester memory

Long Retention (years of data)

Tune:

  • Enable downsampling (1m → 5m → 1h → 24h)
  • Increase compaction interval
  • Use cheaper object storage tier

High Query Load

Tune:

  • Increase query frontend cache TTL
  • Add more queriers
  • Enable query caching layer (Redis)
  • Reduce query complexity (pre-compute aggregations)

Cost-Conscious

Tune:

  • Aggressive downsampling
  • Longer block intervals (4h instead of 2h)
  • Compress storage more
  • Use cheaper storage (AWS S3 Standard → Glacier)

One-Liner Recap

Mimir = Prometheus metrics + distributed storage + query cache + multi-tenancy

Components:

  • Write: Distributor → Ingester → Storage
  • Read: Frontend → Querier → (Ingesters + Blocks + Index)
  • Backend: Object Storage + Compactor + Index

Key insight: Separation of concerns. Stateless read/write paths scale independently. Stateful ingesters handle buffering. Cloud storage handles durability.


Quick Reference: Which Component When?

When you have ingestion slowness?

→ Check Distributor rate limits, add more distributors

When queries are slow?

→ Check Query Frontend cache hit rate, add more queriers

When storage bloats?

→ Tune compactor, enable downsampling, compress more

When cardinality explodes?

→ Increase index size, enable bloom filters, reduce churn

When disk fills up on ingesters?

→ Increase ingester memory, reduce WAL retention


Thanks for reading! If you want to see future content, you can follow me on Twitter or get connected over at LinkedIn.


Support My Content

If you find my content helpful, consider supporting a humanitarian cause (building homes for elderly people in rural Terai region of Nepal) that I am planning with your donation:

Ethereum (ETH)

0xB62409A5B227D2aE7D8C66fdaA5EEf4eB4E37959

Thank you for your support!