When to Use Mimir, Cortex, or Thanos?
Prometheus is excellent at scraping, alerting, and local time-series storage. The hard part starts when you need months of retention, many Prometheus servers, global dashboards, high availability, or tenant isolation.
That is where Grafana Mimir, Cortex, and Thanos enter the picture. They all extend Prometheus, but they optimize for different operating models.
Open the interactive Mimir vs Cortex vs Thanos chooser
Short Answer
Use Mimir when you want a modern, horizontally scalable, multi-tenant metrics backend for Prometheus or OpenTelemetry metrics.
Use Thanos when you want to keep Prometheus as the source of truth in each cluster and add global query, object storage, deduplication, and long-term retention around it.
Use Cortex when you already run Cortex, need compatibility with an existing Cortex architecture, or have a specific reason to stay on the older project lineage. For most new greenfield deployments, Mimir is usually the cleaner Cortex-style choice.
The Decision in One Table
| Situation | Best fit | Why |
|---|---|---|
| New centralized metrics backend | Mimir | Built for horizontally scalable, highly available, multi-tenant Prometheus/OpenTelemetry metrics storage |
| Existing Prometheus servers in many clusters | Thanos | Adds sidecars, global query, object storage, and dedupe without replacing local Prometheus |
| Prometheus-as-a-service for many tenants | Mimir | Tenant limits, query fairness, ingestion scaling, and operational tooling are first-class |
| Already running Cortex successfully | Cortex | Avoid migration unless Mimir features or maintainability justify the change |
| Need the lowest disruption path | Thanos | Keep Prometheus local; attach Thanos components around it |
| Need strong central write-path control | Mimir or Cortex | Remote-write ingestion gives a single backend control plane |
| Want global querying across existing stores | Thanos | Querier aggregates multiple StoreAPI backends |
| Need OpenTelemetry metrics ingestion in the same backend | Mimir | Mimir explicitly targets Prometheus and OpenTelemetry metrics |
Mental Model
Think of the three systems as three answers to the same scaling problem.
Mimir is a metrics warehouse. Prometheus, Grafana Agent, Alloy, or OpenTelemetry Collector remote-write into it. Mimir owns the central write path, tenant limits, replication, long-term blocks, query frontends, rulers, and alerting integrations.
Thanos is a Prometheus federation and durability layer. Prometheus still scrapes and stores data locally. Thanos sidecars or receivers expose/query/upload that data. Thanos Querier gives you a global PromQL endpoint across Prometheus, sidecars, store gateways, and object storage.
Cortex is the older scalable Prometheus backend pattern. It introduced many ideas Mimir also uses: distributor, ingester, querier, query-frontend, compactor, store-gateway, and object storage blocks. Mimir is best understood as the newer Grafana-backed evolution of this architecture.
When to Use Mimir
Choose Mimir when your problem is centralized, multi-tenant metrics at scale.
Mimir is strongest when:
- You have many teams writing into one metrics backend.
- You need per-tenant limits, isolation, quotas, and operational ownership.
- Prometheus local disks are no longer a comfortable retention strategy.
- You want one Grafana datasource for long-range queries.
- You need ingestion and query paths that scale independently.
- You are already using the Grafana ecosystem heavily.
- You want a modern Cortex-style backend for new work.
The write path is usually:
Prometheus / Agent / OTel Collector
-> distributor
-> ingester
-> object storage
-> compactor
The read path is usually:
Grafana
-> query-frontend
-> querier
-> ingester for recent data
-> store-gateway/object storage for older blocks
The key tradeoff: Mimir is a real distributed database system. It is powerful, but it expects you to operate rings, object storage, ingesters, compactors, query components, caches, limits, and capacity planning.
When to Use Thanos
Choose Thanos when you already trust Prometheus locally and want a global view without replacing every Prometheus server.
Thanos is strongest when:
- You have multiple Kubernetes clusters, regions, or environments.
- Each cluster already has Prometheus and should keep local autonomy.
- You need global dashboards across many Prometheus instances.
- You want long-term object storage without centralizing every scrape immediately.
- You need HA pair deduplication at query time.
- You want a lower-disruption adoption path.
The common sidecar path is:
Prometheus
-> Thanos sidecar
-> object storage
-> Thanos store-gateway
-> Thanos query
-> Grafana
Thanos also has Receive for remote-write based ingestion, but Thanos is still best known for adding a global query layer and object storage around Prometheus.
The key tradeoff: Thanos can become operationally wide. You manage Prometheus in every cluster plus Thanos sidecars, query, store-gateway, compactor, bucket health, and potentially receive/rule components. It preserves autonomy, but you still own the moving pieces.
When to Use Cortex
Choose Cortex when you have an existing Cortex deployment or a compatibility requirement.
Cortex is strongest when:
- You already run Cortex and it is stable.
- Your deployment tooling, dashboards, alerts, or tenant model are Cortex-specific.
- You depend on Cortex behavior that you have validated in production.
- Migration risk is higher than the benefit of moving.
For new deployments, Cortex is harder to recommend over Mimir unless your team has a strong Cortex-specific reason. Mimir follows the same broad backend pattern, but the center of gravity for new Grafana-backed Prometheus-compatible storage work has moved toward Mimir.
The Most Important Difference
The biggest difference is not the storage format. It is where control lives.
With Mimir, control moves into a central metrics backend. Prometheus becomes a scraper and remote-write agent. The backend owns ingestion, tenancy, retention, query acceleration, and operational limits.
With Thanos, control stays closer to each Prometheus server. Prometheus remains locally useful. Thanos layers global query, dedupe, and object storage over the top.
With Cortex, control also moves into a central backend, but you are choosing the older Cortex lineage rather than Mimir.
Practical Scenarios
One Kubernetes Cluster, Small Team
Use Prometheus alone first. Add long-term storage only when retention, HA, or query scope actually hurts.
If you need long retention with minimal disruption, use Thanos sidecar and object storage.
If you expect many teams, high cardinality, and central observability ownership, start evaluating Mimir.
Many Clusters, Local Ownership
Use Thanos.
Each cluster keeps its Prometheus. Thanos Query gives you a global PromQL endpoint. Sidecars upload blocks to object storage. Store-gateway serves old data. Compactor handles object storage hygiene.
This works well when platform teams want a global view but individual clusters still need local metrics.
Many Teams, Central Platform
Use Mimir.
This is the classic managed metrics platform shape: teams remote-write into one backend, and the platform team enforces tenant limits, retention, dashboards, and reliability centrally.
Existing Cortex Platform
Stay on Cortex if it is stable and the migration cost is not justified.
Move toward Mimir when you need features, performance work, operational improvements, or ecosystem alignment that Cortex is not giving you.
Regulated or Tenant-Isolated Organization
Prefer Mimir when tenant isolation is a core product requirement.
Thanos can separate clusters and object storage paths, but Mimir’s model is more naturally built around tenants, limits, and centralized policy.
Cost Model
The cost shape is different.
Mimir costs concentrate in central ingestion, ingester memory/WAL, object storage, query workers, caches, and compaction. You pay for a central backend, but you get central control.
Thanos costs spread across every Prometheus plus sidecars, object storage, store gateways, query nodes, and compaction. You keep local Prometheus costs, then add global query and storage costs.
Cortex costs look similar to Mimir because the architecture is similar: distributors, ingesters, queriers, compactors, store-gateways, object storage, and caches.
Failure Model
Mimir and Cortex protect recent writes through ingester replication and WAL behavior. If an ingester dies, replicas and WAL recovery matter. Object storage protects flushed blocks.
Thanos protects global query by deduplicating HA Prometheus replicas and reading from multiple StoreAPI backends. Local Prometheus remains useful even if the global layer is degraded, depending on your deployment.
This matters operationally:
- If central ingestion must never become a bottleneck, design Mimir/Cortex carefully.
- If cluster-local monitoring must survive global outages, Thanos is attractive.
- If long-term queries must be centrally governed, Mimir is attractive.
Rules of Thumb
Pick Mimir if the sentence is: “We need a central scalable metrics platform.”
Pick Thanos if the sentence is: “We have Prometheus everywhere and need a global view.”
Pick Cortex if the sentence is: “We already run Cortex and migration is not worth it yet.”
Do not pick any of them just because Prometheus exists. Pick them when Prometheus’ local-node model is the actual bottleneck.
Migration Paths
Prometheus to Thanos
- Add sidecar to Prometheus.
- Configure object storage.
- Add Thanos Query.
- Add Store Gateway for historical blocks.
- Add Compactor.
- Add Query Frontend if query load needs it.
This path keeps Prometheus mostly intact.
Prometheus to Mimir
- Deploy Mimir in monolithic or distributed mode.
- Configure object storage.
- Configure Prometheus remote-write.
- Add tenant headers/auth path.
- Set per-tenant limits.
- Move Grafana datasource to Mimir.
- Tune query-frontend, ingesters, compactors, and store-gateways.
This path changes the write path more directly.
Cortex to Mimir
Treat it as a platform migration, not a package upgrade.
Inventory tenant limits, dashboards, alerts, object storage, retention, ruler behavior, query paths, and operational runbooks. Then test side-by-side before moving production remote-write traffic.
Final Recommendation
For new work, the default split is simple:
- Mimir for centralized, multi-tenant Prometheus-compatible metrics storage.
- Thanos for global querying and long-term retention around existing Prometheus fleets.
- Cortex for existing Cortex estates or compatibility-driven cases.
The better question is not “Which one is best?” The better question is “Where do I want the responsibility boundary: central backend or distributed Prometheus fleet?”