May 09, 2026

When to Use Mimir, Cortex, or Thanos?

Prometheus is excellent at scraping, alerting, and local time-series storage. The hard part starts when you need months of retention, many Prometheus servers, global dashboards, high availability, or tenant isolation.

That is where Grafana Mimir, Cortex, and Thanos enter the picture. They all extend Prometheus, but they optimize for different operating models.

Open the interactive Mimir vs Cortex vs Thanos chooser

Short Answer

Use Mimir when you want a modern, horizontally scalable, multi-tenant metrics backend for Prometheus or OpenTelemetry metrics.

Use Thanos when you want to keep Prometheus as the source of truth in each cluster and add global query, object storage, deduplication, and long-term retention around it.

Use Cortex when you already run Cortex, need compatibility with an existing Cortex architecture, or have a specific reason to stay on the older project lineage. For most new greenfield deployments, Mimir is usually the cleaner Cortex-style choice.

The Decision in One Table

Situation	Best fit	Why
New centralized metrics backend	Mimir	Built for horizontally scalable, highly available, multi-tenant Prometheus/OpenTelemetry metrics storage
Existing Prometheus servers in many clusters	Thanos	Adds sidecars, global query, object storage, and dedupe without replacing local Prometheus
Prometheus-as-a-service for many tenants	Mimir	Tenant limits, query fairness, ingestion scaling, and operational tooling are first-class
Already running Cortex successfully	Cortex	Avoid migration unless Mimir features or maintainability justify the change
Need the lowest disruption path	Thanos	Keep Prometheus local; attach Thanos components around it
Need strong central write-path control	Mimir or Cortex	Remote-write ingestion gives a single backend control plane
Want global querying across existing stores	Thanos	Querier aggregates multiple StoreAPI backends
Need OpenTelemetry metrics ingestion in the same backend	Mimir	Mimir explicitly targets Prometheus and OpenTelemetry metrics

Mental Model

Think of the three systems as three answers to the same scaling problem.

Mimir is a metrics warehouse. Prometheus, Grafana Agent, Alloy, or OpenTelemetry Collector remote-write into it. Mimir owns the central write path, tenant limits, replication, long-term blocks, query frontends, rulers, and alerting integrations.

Thanos is a Prometheus federation and durability layer. Prometheus still scrapes and stores data locally. Thanos sidecars or receivers expose/query/upload that data. Thanos Querier gives you a global PromQL endpoint across Prometheus, sidecars, store gateways, and object storage.

Cortex is the older scalable Prometheus backend pattern. It introduced many ideas Mimir also uses: distributor, ingester, querier, query-frontend, compactor, store-gateway, and object storage blocks. Mimir is best understood as the newer Grafana-backed evolution of this architecture.

When to Use Mimir

Choose Mimir when your problem is centralized, multi-tenant metrics at scale.

Mimir is strongest when:

You have many teams writing into one metrics backend.
You need per-tenant limits, isolation, quotas, and operational ownership.
Prometheus local disks are no longer a comfortable retention strategy.
You want one Grafana datasource for long-range queries.
You need ingestion and query paths that scale independently.
You are already using the Grafana ecosystem heavily.
You want a modern Cortex-style backend for new work.

The write path is usually:

Prometheus / Agent / OTel Collector
  -> distributor
  -> ingester
  -> object storage
  -> compactor

The read path is usually:

Grafana
  -> query-frontend
  -> querier
  -> ingester for recent data
  -> store-gateway/object storage for older blocks

The key tradeoff: Mimir is a real distributed database system. It is powerful, but it expects you to operate rings, object storage, ingesters, compactors, query components, caches, limits, and capacity planning.

When to Use Thanos

Choose Thanos when you already trust Prometheus locally and want a global view without replacing every Prometheus server.

Thanos is strongest when:

You have multiple Kubernetes clusters, regions, or environments.
Each cluster already has Prometheus and should keep local autonomy.
You need global dashboards across many Prometheus instances.
You want long-term object storage without centralizing every scrape immediately.
You need HA pair deduplication at query time.
You want a lower-disruption adoption path.

The common sidecar path is:

Prometheus
  -> Thanos sidecar
  -> object storage
  -> Thanos store-gateway
  -> Thanos query
  -> Grafana

Thanos also has Receive for remote-write based ingestion, but Thanos is still best known for adding a global query layer and object storage around Prometheus.

The key tradeoff: Thanos can become operationally wide. You manage Prometheus in every cluster plus Thanos sidecars, query, store-gateway, compactor, bucket health, and potentially receive/rule components. It preserves autonomy, but you still own the moving pieces.

When to Use Cortex

Choose Cortex when you have an existing Cortex deployment or a compatibility requirement.

Cortex is strongest when:

You already run Cortex and it is stable.
Your deployment tooling, dashboards, alerts, or tenant model are Cortex-specific.
You depend on Cortex behavior that you have validated in production.
Migration risk is higher than the benefit of moving.

For new deployments, Cortex is harder to recommend over Mimir unless your team has a strong Cortex-specific reason. Mimir follows the same broad backend pattern, but the center of gravity for new Grafana-backed Prometheus-compatible storage work has moved toward Mimir.

The Most Important Difference

The biggest difference is not the storage format. It is where control lives.

With Mimir, control moves into a central metrics backend. Prometheus becomes a scraper and remote-write agent. The backend owns ingestion, tenancy, retention, query acceleration, and operational limits.

With Thanos, control stays closer to each Prometheus server. Prometheus remains locally useful. Thanos layers global query, dedupe, and object storage over the top.

With Cortex, control also moves into a central backend, but you are choosing the older Cortex lineage rather than Mimir.

Practical Scenarios

One Kubernetes Cluster, Small Team

Use Prometheus alone first. Add long-term storage only when retention, HA, or query scope actually hurts.

If you need long retention with minimal disruption, use Thanos sidecar and object storage.

If you expect many teams, high cardinality, and central observability ownership, start evaluating Mimir.

Many Clusters, Local Ownership

Use Thanos.

Each cluster keeps its Prometheus. Thanos Query gives you a global PromQL endpoint. Sidecars upload blocks to object storage. Store-gateway serves old data. Compactor handles object storage hygiene.

This works well when platform teams want a global view but individual clusters still need local metrics.

Many Teams, Central Platform

Use Mimir.

This is the classic managed metrics platform shape: teams remote-write into one backend, and the platform team enforces tenant limits, retention, dashboards, and reliability centrally.

Existing Cortex Platform

Stay on Cortex if it is stable and the migration cost is not justified.

Move toward Mimir when you need features, performance work, operational improvements, or ecosystem alignment that Cortex is not giving you.

Regulated or Tenant-Isolated Organization

Prefer Mimir when tenant isolation is a core product requirement.

Thanos can separate clusters and object storage paths, but Mimir’s model is more naturally built around tenants, limits, and centralized policy.

Cost Model

The cost shape is different.

Mimir costs concentrate in central ingestion, ingester memory/WAL, object storage, query workers, caches, and compaction. You pay for a central backend, but you get central control.

Thanos costs spread across every Prometheus plus sidecars, object storage, store gateways, query nodes, and compaction. You keep local Prometheus costs, then add global query and storage costs.

Cortex costs look similar to Mimir because the architecture is similar: distributors, ingesters, queriers, compactors, store-gateways, object storage, and caches.

Failure Model

Mimir and Cortex protect recent writes through ingester replication and WAL behavior. If an ingester dies, replicas and WAL recovery matter. Object storage protects flushed blocks.

Thanos protects global query by deduplicating HA Prometheus replicas and reading from multiple StoreAPI backends. Local Prometheus remains useful even if the global layer is degraded, depending on your deployment.

This matters operationally:

If central ingestion must never become a bottleneck, design Mimir/Cortex carefully.
If cluster-local monitoring must survive global outages, Thanos is attractive.
If long-term queries must be centrally governed, Mimir is attractive.

Rules of Thumb

Pick Mimir if the sentence is: “We need a central scalable metrics platform.”

Pick Thanos if the sentence is: “We have Prometheus everywhere and need a global view.”

Pick Cortex if the sentence is: “We already run Cortex and migration is not worth it yet.”

Do not pick any of them just because Prometheus exists. Pick them when Prometheus’ local-node model is the actual bottleneck.

Migration Paths

Prometheus to Thanos

Add sidecar to Prometheus.
Configure object storage.
Add Thanos Query.
Add Store Gateway for historical blocks.
Add Compactor.
Add Query Frontend if query load needs it.

This path keeps Prometheus mostly intact.

Prometheus to Mimir

Deploy Mimir in monolithic or distributed mode.
Configure object storage.
Configure Prometheus remote-write.
Add tenant headers/auth path.
Set per-tenant limits.
Move Grafana datasource to Mimir.
Tune query-frontend, ingesters, compactors, and store-gateways.

This path changes the write path more directly.

Cortex to Mimir

Treat it as a platform migration, not a package upgrade.

Inventory tenant limits, dashboards, alerts, object storage, retention, ruler behavior, query paths, and operational runbooks. Then test side-by-side before moving production remote-write traffic.

Final Recommendation

For new work, the default split is simple:

Mimir for centralized, multi-tenant Prometheus-compatible metrics storage.
Thanos for global querying and long-term retention around existing Prometheus fleets.
Cortex for existing Cortex estates or compatibility-driven cases.

The better question is not “Which one is best?” The better question is “Where do I want the responsibility boundary: central backend or distributed Prometheus fleet?”

References

← Older

Understanding the Context Vector in Attention

Newer →

Contextual Embeddings vs Static Embeddings: How Transformers Understand Meaning