May 29, 2026

Mimir: Scalable Prometheus with Interactive Architecture

Grafana Mimir is a highly available, scalable, long-term storage for Prometheus. Think of it as "Prometheus on steroids" — distributed, fault-tolerant, and built for production scale.

Prometheus

cpu_usagehttp_requestslatency_p95memory_byteserror_rate

Grafana Mimir

Distributorvalidate + shard

Ingesterreplicate + buffer

Queriermerge + dedupe

Long-term storage

2h6h24h30d1y

Short-lived Prometheus samples become replicated Mimir writes, then compacted blocks that stay queryable for months or years.

Open the interactive Mimir architecture explorer

The Core Idea

Prometheus is great for short-term metrics, but Mimir solves the storage problem: how do you keep metrics for months/years without running out of disk space or memory?

Mimir = Prometheus metrics → distributed storage → queryable at scale

Three-Layer Architecture Model

Mimir's components cleanly separate into three concerns:

Read Path — How queries happen
Write Path — How metrics get in
Backend Storage — Where everything lives

This separation is the key insight. Let me break it down.

Write Path Components (Ingestion)

These components handle incoming Prometheus metrics.

Distributor

What it does:

Receives metrics from Prometheus scrapers (or Prometheus clients)
Validates metrics format
Adds authentication/rate limits
Routes data to ingesters

Mental model: The bouncer at the club. Checks your ID, makes sure you're allowed in, then directs you to a table.

Key details:

Stateless (can scale horizontally)
Handles load balancing to ingesters
Enforces rate limits per tenant
Checksums data for integrity

Ingester

What it does:

Receives time-series data from distributors
Keeps recent data in memory (write buffer)
Periodically flushes to long-term storage
Handles data replication

Mental model: A server at a restaurant. Takes your order, writes it down, then hands it to the kitchen.

Key details:

Stateful (persistent across requests)
One ingester per shard
Maintains multiple replicas for fault tolerance
Holds ~2-3 hours of metrics in memory

Read Path Components (Querying)

These components handle PromQL queries.

Query Frontend

What it does:

Receives PromQL queries from users/Grafana
Splits large queries into smaller chunks
Caches query results
Returns results to users

Mental model: A receptionist. Takes your request, figures out who needs to handle it, caches common questions.

Key details:

Stateless (scales horizontally)
Implements query caching (reduces load)
Query-result caching (10-30 second typical TTL)
Useful for dashboard queries (same query, repeatedly)

Querier

What it does:

Executes PromQL against metrics
Reads from ingesters (recent data)
Reads from long-term storage (old data)
Aggregates results

Mental model: A librarian searching for books. Checks recent arrivals (ingesters) and the archive (storage).

Key details:

Stateless (scales horizontally)
Fetches data from multiple sources
Deduplicates & merges time-series
Handles failed queries gracefully

Backend Storage Components

These handle long-term persistence.

Blocks Storage

What it does:

Stores compressed metric blocks
One file per ~2-hour chunk
Organized by tenant & metric

Mental model: A filing cabinet. Organized, compressed, tamper-proof.

Key details:

2-hour blocks typical
TSDB format (same as Prometheus)
Immutable (write-once)
Highly compressible

Object Storage (S3, GCS, Azure)

What it does:

Durably stores all blocks
Highly available & redundant
Multi-region capable

Mental model: Bank vault. Distributed, replicated, backed up.

Key details:

Cloud-native (S3, GCS, Azure Blob)
Or MinIO for on-prem
Survives machine failures
Can span regions

Compactor

What it does:

Merges small blocks into larger ones
Deduplicates time-series
Downsamples old data (5m→1h→24h)
Optimizes storage

Mental model: An archivist. Takes loose papers, merges them into binders, throws away outdated copies.

Key details:

Batch job (runs periodically)
Reduces query latency
Saves storage space
Downsampling: trade resolution for space

Index (Bloom/BoltDB)

What it does:

Fast label lookups
Bloom filters for cardinality
Quick series discovery

Mental model: A book's index. Find page numbers quickly without reading every page.

Key details:

Speeds up metric discovery
Prevents memory exhaustion
Handles high cardinality

Interactive Component Diagram

Here's how they connect:

Write Flow

Prometheus Scraper
        ↓
   Distributor (Stateless)
        ↓
   Ingester (Stateful, in-memory)
        ↓
   Object Storage (S3/GCS)

Read that as:

Prometheus pushes metrics to distributor
Distributor rate-limits, validates, routes
Ingester buffers in memory
Periodically flushed to cloud storage

Read Flow

User / Grafana
        ↓
Query Frontend (Caching)
        ↓
   Querier (Orchestrator)
        ↓
    /----├----\
Ingester  Blocks  Index
(recent)  (old)  (fast lookup)

Read that as:

Query arrives at frontend
Frontend checks cache
Querier fetches from multiple sources
Results merged & returned

Real Example: "Show CPU usage last 7 days"

Write Side (What happened):

Day 1: Prometheus sends metrics → Distributor → Ingester (in-memory)
Day 1-3: Ingester flushes blocks → S3 (after 3-4 hours)
Nightly: Compactor merges blocks (reduce from 100 → 20 blocks)

Read Side (Query happens now):

User queries "CPU last 7 days"
Query Frontend splits into:
- "Last 2 hours" → check recent ingesters
- "2-24 hours ago" → check blocks from S3
- "1-7 days ago" → check downsampled blocks (1m data → 5m)
Querier parallelizes reads
Index helps find "cpu_usage" metric instantly
Results stream back to user

Total latency: ~500ms - 2s (depending on query complexity)

Component Interaction Matrix

Component	Read	Write	Stateless?	Scales?
Distributor	❌	✅	✅	✅
Ingester	✅	✅	❌	⚠️ (stateful)
Query Frontend	✅	❌	✅	✅
Querier	✅	❌	✅	✅
Compactor	❌	❌	✅	✅ (batch job)
Object Storage	✅	✅	N/A	✅ (infinite)
Index	✅	❌	✅	✅

Write Path Deep Dive

Step 1: Metric arrives at Distributor

Metric:
  name: up
  labels:
    job: prometheus
    instance: localhost:9090
  timestamp: 1234567890
  value: 1

Distributor checks:

Format valid? ✅
User authenticated? ✅
Rate limit OK? ✅
Tenant exists? ✅

Then: Hash metric labels → pick 3 ingesters (replication)

Step 2: Ingester buffers

Ingester memory state:
┌─────────────────────────────┐
│ WAL (Write-Ahead Log)       │
├─────────────────────────────┤
│ Time-series in-memory db    │
│ up{job=prometheus}...       │
│ http_requests_total{...}... │
│ ...                         │
└─────────────────────────────┘

Key: WAL persists to disk (survives restarts)

Step 3: Periodic flush to storage

Ingester → Compresses → Blocks → Object Storage
  (3GB RAM)      (50MB)     (S3)

Compression: ~50:1 typical (time-series is repetitive)

Read Path Deep Dive

Step 1: Query arrives at Frontend

User writes: rate(up[5m])

Frontend:

Parses query
Checks label set (what metrics are needed?)
Splits by time range:
- 0-2h ago → ingesters
- 2h-1yr ago → blocks
- 1yr+ ago → downsampled blocks

Step 2: Querier fetches from multiple sources

Querier parallel fetch:
├─ Ingester 1 (recent data)
├─ Ingester 2 (replication)
├─ Blocks reader (historical)
└─ Index (label lookup)

Deduplication: If data exists in both ingester & blocks, keep one copy.

Step 3: Results stream back

Results:
│ timestamp | value |
│-----------|-------|
│ 1234567890|  1    |
│ 1234567891|  1    |
│ 1234567892|  1    |

Scaling Mimir

Horizontal Scaling (add more machines)

Component	Scale Strategy	Notes
Distributor	Add replicas	Stateless, easy
Ingester	Add shards	Hash-based routing
Query Frontend	Add replicas	Stateless, cache-friendly
Querier	Add replicas	Stateless
Compactor	Single job	Or distributed compaction
Storage	Infinite	Cloud storage scales automatically

Vertical Scaling (bigger machines)

Ingesters: More RAM = hold metrics longer = cheaper storage
Queriers: More CPU = faster PromQL evaluation
Distributors: More CPU = higher throughput

Tenancy (Multi-tenant Prometheus)

Mimir supports multiple independent Prometheus instances in one cluster.

Tenant Isolation

Distributor receives metric
├─ Extract tenant ID from request header
├─ Route to tenant-specific ingesters
├─ Store in tenant-specific blocks
└─ Query frontend filters by tenant

Example:

Team A: Prometheus → Distributor (tenant_id=team-a)
Team B: Prometheus → Distributor (tenant_id=team-b)

Same Mimir cluster, complete isolation.

Failure Scenarios & Recovery

Ingester Dies

Distributor routes to healthy ingesters
Lost data still in object storage (replicated)
No data loss (because of replication factor)
New ingester starts, re-syncs from storage

Distributor Dies

Load balancer detects, routes to next distributor
Completely stateless, no state recovery needed
Query continues uninterrupted

Storage Goes Down

Queries hitting blocks stall
Recent ingesters still serve ~2h of data
Ingesters keep writing to WAL (disk)
When storage returns, ingesters resume flushing

Tuning for Your Use Case

High Cardinality (many unique metrics)

Tune:

Increase index size
Enable bloom filters
Reduce compaction interval
Increase ingester memory

Long Retention (years of data)

Tune:

Enable downsampling (1m → 5m → 1h → 24h)
Increase compaction interval
Use cheaper object storage tier

High Query Load

Tune:

Increase query frontend cache TTL
Add more queriers
Enable query caching layer (Redis)
Reduce query complexity (pre-compute aggregations)

Cost-Conscious

Tune:

Aggressive downsampling
Longer block intervals (4h instead of 2h)
Compress storage more
Use cheaper storage (AWS S3 Standard → Glacier)

One-Liner Recap

Mimir = Prometheus metrics + distributed storage + query cache + multi-tenancy

Components:

Write: Distributor → Ingester → Storage
Read: Frontend → Querier → (Ingesters + Blocks + Index)
Backend: Object Storage + Compactor + Index

Key insight: Separation of concerns. Stateless read/write paths scale independently. Stateful ingesters handle buffering. Cloud storage handles durability.

Quick Reference: Which Component When?

When you have ingestion slowness?

→ Check Distributor rate limits, add more distributors

When queries are slow?

→ Check Query Frontend cache hit rate, add more queriers

When storage bloats?

→ Tune compactor, enable downsampling, compress more

When cardinality explodes?

→ Increase index size, enable bloom filters, reduce churn

When disk fills up on ingesters?

→ Increase ingester memory, reduce WAL retention

← Older

Contextual Embeddings vs Static Embeddings: How Transformers Understand Meaning

Newer →

Grafana Alloy, Fleet Management, Loki, Mimir, and Alerting

Mimir: Scalable Prometheus with Interactive Architecture

The Core Idea

Three-Layer Architecture Model

Write Path Components (Ingestion)

Distributor

Ingester

Read Path Components (Querying)

Query Frontend

Querier

Backend Storage Components

Blocks Storage

Object Storage (S3, GCS, Azure)

Compactor

Index (Bloom/BoltDB)

Interactive Component Diagram

Write Flow

Read Flow

Real Example: "Show CPU usage last 7 days"

Write Side (What happened):

Read Side (Query happens now):

Component Interaction Matrix

Write Path Deep Dive

Step 1: Metric arrives at Distributor

Step 2: Ingester buffers

Step 3: Periodic flush to storage

Read Path Deep Dive

Step 1: Query arrives at Frontend

Step 2: Querier fetches from multiple sources

Step 3: Results stream back

Scaling Mimir

Horizontal Scaling (add more machines)

Vertical Scaling (bigger machines)

Tenancy (Multi-tenant Prometheus)

Tenant Isolation

Failure Scenarios & Recovery

Ingester Dies

Distributor Dies

Storage Goes Down

Tuning for Your Use Case

High Cardinality (many unique metrics)

Long Retention (years of data)

High Query Load

Cost-Conscious

One-Liner Recap

Quick Reference: Which Component When?

When you have ingestion slowness?

When queries are slow?

When storage bloats?

When cardinality explodes?

When disk fills up on ingesters?

Support My Content

Ethereum (ETH)