Jan 31, 2026

Prometheus: Practical Guide & Mental Model

Prometheus is a pull-based monitoring system and time-series database designed for reliable metrics collection, alerting, and exploration. This guide gives you a compact mental model and the practical pieces you need to operate it at scale.

1) What Prometheus Is (and isn’t)

Prometheus is great for infrastructure metrics, application telemetry, and alerting. It is not a long-term log archive or a general-purpose OLAP system.

Key properties:

Single static binary (cross-platform)
Pulls metrics over HTTP
Stores time-series locally
Labels as first-class dimensions
PromQL for queries

2) Architecture in One Diagram

[ Linux / Windows / Apps ]
          |
          |  expose /metrics (HTTP)
          v
[ Exporters / Instrumented Apps ]
          |
          |  scrape (HTTP GET)
          v
[ Prometheus Server ]
          |
          |  PromQL queries
          v
[ Grafana / Alerts ]

3) “Scrape” Means Prometheus Pulls

In Prometheus, a scrape is when Prometheus initiates an HTTP request to a target and pulls metrics.

Concretely:

Prometheus  ──HTTP GET──▶  http://target:port/metrics

What that implies:

The target does nothing proactively
The target only exposes /metrics
Prometheus controls when, how often, and how long it waits

Scrape loop (per target):

every scrape_interval:
  start timer
  GET /metrics
  parse text format
  store time-series
  stop timer

That’s why you see:

scrape_interval: how often Prometheus scrapes a target
scrape_timeout: the max time Prometheus waits for a scrape to finish
scrape_duration: how long the last scrape actually took (a measured duration, not a config setting)

Pull vs Push (contrast):

Prometheus (Pull / Scrape): Prometheus calls you, centralized control, easier debugging, safer at scale
Push systems: Apps push metrics out, harder governance, more network + retry complexity

Prometheus can accept pushed metrics via Pushgateway, but that’s the exception, not the norm.

Why pull matters operationally:

Centralized scrape schedules
Uniform auth and TLS
Easier service discovery
Built-in health signal (up)

If a scrape fails, Prometheus knows immediately. In push systems, silence can look like “everything is fine.”

One sentence to remember:

In Prometheus, a “scrape” is Prometheus pulling metrics over HTTP from a target.

4) Exporters: Turning State Into Metrics

Exporters translate system or app state into Prometheus metrics.

Node Exporter (Linux)

Common metrics:

CPU: node_cpu_seconds_total
Memory: node_memory_MemFree_bytes
Disk: node_disk_io_time_seconds_total, node_disk_read_bytes_total
Network: node_network_receive_bytes_total

Example:

node_disk_io_time_seconds_total{device="sda"} 104296

Windows Exporter

Common metrics:

windows_cpu_time_total
windows_memory_available_bytes
windows_logical_disk_free_bytes

Application & Batch Exporters

Libraries exist for Java, Python, Go, and Node.js. Example batch metrics:

process_cpu_seconds_total 5.73
worker_jobs_total{status="processed"} 1570222
worker_jobs_total{status="failed"} 155665

Blackbox Exporter (active probing)

For endpoints that don’t expose /metrics, the Blackbox Exporter performs HTTP/TCP/ICMP/TLS probes and exposes results as scrapeable metrics.

Prometheus Targets: Quick Mapping (Color-Coded)

Key point: Prometheus does NOT auto-detect processes. A running process alone gives Prometheus nothing (Linux or Windows). You must expose metrics or use exporters.

Need	What Prometheus needs	How to get it	Notes
Node.js app metrics	/metrics endpoint	Instrument app (prom-client)	Prometheus scrapes the app directly
Linux host metrics	Host metrics endpoint	Install node-exporter	CPU, RAM, disk, etc.
Windows host metrics	Host metrics endpoint	Install windows_exporter	Prometheus can’t see Windows processes by itself
Availability / ping / port check	Blackbox probe	blackbox_exporter (ICMP, HTTP, TCP)	Use for URL, ping, port-open checks
URL uptime	HTTP probe	blackbox exporter (http_2xx)	Returns success/failure, latency
Ping / ICMP	ICMP probe	blackbox exporter (icmp)	Requires ICMP permissions
Port open (e.g., 443, 27017)	TCP probe	blackbox exporter (tcp_connect)	Validates port reachability
MongoDB metrics	MongoDB metrics endpoint	mongodb_exporter	Not automatic; needs exporter
Custom app metrics	/metrics endpoint	Add a custom exporter or instrument the app	Prometheus only scrapes exposed metrics

— thijs

5) Metric Types (What to Use and When)

Counter

Monotonically increasing; resets on restart.

process_cpu_seconds_total
http_requests_total

Gauge

Can go up and down.

node_memory_MemFree_bytes
queue_depth

Histogram (preferred for latency)

Bucketed distribution; aggregatable across instances.

http_request_duration_seconds_bucket
http_request_duration_seconds_sum
http_request_duration_seconds_count

Summary (use carefully)

Client-side quantiles; not aggregatable across instances.

6) Labels: Prometheus’ Core Data Model

Every metric is a name + labels:

metric_name{label1="value1", label2="value2"} value @ timestamp

Examples:

node_disk_io_time_seconds_total{device="sda", instance="linux-1"} 104296
http_requests_total{method="GET", status="200", service="api"} 982734

Labels give you slicing, aggregation, and multi-dimensional queries. Too many labels = high cardinality, which costs CPU and storage.

7) Prometheus Configuration (prometheus.yml)

Global config

global:
  scrape_interval: 10s

Static scrape configs

scrape_configs:
  - job_name: "linux"
    static_configs:
      - targets: ["ip-linux:9100"]

  - job_name: "batch"
    static_configs:
      - targets: ["web-app:8080"]

  - job_name: "windows"
    static_configs:
      - targets: ["win-2019:9182"]

job_name becomes a label automatically, which is useful for grouping.

Custom scrape example (Node.js app)

Definition: a custom scrape is any job you define in scrape_configs for a target that is specific to your environment (a service, exporter, device, or endpoint you decide to monitor), beyond the default examples.

What a custom scrape could represent (examples):

A Node.js API exposing /metrics
A Python worker exposing /metrics
A Go service exposing /metrics
A Linux node exporter (:9100)
A database exporter (Postgres, Redis, MySQL)
A load balancer or proxy exporter (Nginx, HAProxy)
A message queue exporter (Kafka, RabbitMQ)
A Kubernetes component (kube-state-metrics, cAdvisor)
A blackbox probe (HTTP/TCP/ICMP checks)
A third-party SaaS metrics endpoint

Practical example: your Node.js service exposes http://localhost:3000/metrics (via prom-client or similar). Prometheus will hit that URL every 30s, wait up to 30s for a response, and attach the label env="dev" to all ingested series from that target.

Quick sanity checks:

Sample /metrics output (what Prometheus sees):

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1243
process_cpu_seconds_total 5.73

Verify scrape success in PromQL:

up{job="nodejs-app"}

scrape_configs:
  - job_name: "nodejs-app"
    scrape_interval: 30s
    scrape_timeout: 30s
    metrics_path: /metrics
    scheme: http
    static_configs:
      - targets:
          - localhost:3000
        labels:
          env: dev

    # auth (choose one)
    # no auth (default): nothing to set
    # bearer_token: "YOUR_TOKEN"
    # bearer_token_file: /etc/prometheus/token
    # basic_auth:
    #   username: "user"
    #   password: "pass"
    # authorization:
    #   type: Bearer
    #   credentials: "YOUR_TOKEN"

    # tls / https (if needed)
    # scheme: https
    # tls_config:
    #   ca_file: /etc/prometheus/ca.pem
    #   cert_file: /etc/prometheus/client.pem
    #   key_file: /etc/prometheus/client.key
    #   insecure_skip_verify: false
    #   (if true, Prometheus skips TLS certificate verification;
    #    useful for self-signed certs in dev, but unsafe for prod)

8) Blackbox Exporter (Active Probing)

The Blackbox Exporter lets Prometheus monitor things that don’t expose /metrics themselves (URLs, ports, ICMP ping, TLS checks). Prometheus still scrapes the exporter; the exporter actively probes targets and returns results as metrics.

What it’s used for:

URL uptime and latency (HTTP/HTTPS)
TCP port availability
ICMP ping reachability
TLS handshake and certificate checks

Minimal wiring model:

Prometheus ──scrape──▶ Blackbox Exporter ──probe──▶ Target

9) Ping Monitoring Architecture (Blackbox / ICMP)

Ping-style monitoring in Prometheus is typically done via the Blackbox Exporter, which probes targets and exposes results for Prometheus to scrape.

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (ICMP/HTTP/TCP)──▶ Target

Key idea: Prometheus still pulls from the exporter; the exporter pushes probes to the target and reports success, latency, and errors as metrics.

Typical setup:

Run Blackbox Exporter (in the same network as Prometheus or near targets)
Configure scrape_configs with metrics_path: /probe
Pass module and target as query params
Query probe_success, probe_duration_seconds, and probe_icmp_*

Example config snippet:

scrape_configs:
  - job_name: "ping"
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
          - 1.1.1.1
          - 8.8.8.8
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="ping"}
probe_duration_seconds{job="ping"}

Use this for reachability and latency checks across networks; it complements service-level metrics rather than replacing them.

10) Port Monitoring Architecture (TCP Checks)

Port monitoring is also done via Blackbox Exporter, using the tcp module to test if a port is reachable (and optionally perform a simple handshake).

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (TCP connect)──▶ Target:Port

Typical setup:

Run Blackbox Exporter
Configure scrape_configs with metrics_path: /probe
Set module: [tcp_connect]
Query probe_success and probe_duration_seconds

Example config snippet:

scrape_configs:
  - job_name: "ports"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - 10.0.1.10:22
          - 10.0.1.20:5432
          - example.com:443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="ports"}
probe_duration_seconds{job="ports"}

Use this for reachability and basic port availability; it complements service metrics and deeper health checks.

11) MongoDB Monitoring (Exporter)

MongoDB monitoring in Prometheus is typically done via the MongoDB Exporter, which exposes MongoDB stats at /metrics for Prometheus to scrape.

High-level flow:

MongoDB ──stats──▶ MongoDB Exporter ──/metrics──▶ Prometheus

Typical setup:

Run MongoDB Exporter near your database
Provide a MongoDB connection URI with a read-only user
Add a scrape_configs job for the exporter
Query key metrics like connections, ops, and replication lag

Example config snippet:

scrape_configs:
  - job_name: "mongodb"
    static_configs:
      - targets: ["mongodb-exporter:9216"]

Common metrics to watch:

mongodb_connections{state="current"}
mongodb_op_counters_total
mongodb_mongod_replset_member_state
mongodb_replset_lag

Use this for database health, throughput, and replication visibility; pair it with application-level metrics for end-to-end views.

12) URL Monitoring (HTTP Checks)

URL monitoring in Prometheus is usually done via the Blackbox Exporter using the http module to check availability, status codes, and latency.

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (HTTP GET/HEAD)──▶ URL

Typical setup:

Run Blackbox Exporter
Configure scrape_configs with metrics_path: /probe
Set module: [http_2xx] (or your custom module)
Query probe_success, probe_http_status_code, probe_duration_seconds

Example config snippet:

scrape_configs:
  - job_name: "urls"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com
          - https://status.example.com/health
          - http://internal-api:8080/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="urls"}
probe_http_status_code{job="urls"}
probe_duration_seconds{job="urls"}

Use this for uptime, HTTP status, and latency checks; pair it with application metrics for deeper diagnostics.

13) Service Discovery (When Static Targets Don’t Scale)

File-based discovery

file_sd_configs:
  - files:
      - /etc/prometheus/targets/*.json

Example JSON:

[
  {
    "targets": ["10.0.1.5:9100"],
    "labels": {
      "env": "prod",
      "team": "infra"
    }
  }
]

Other discovery options:

DNS / SRV records
Kubernetes (pod, service, node)
Cloud providers (AWS, GCP, Azure)

14) Relabeling (Critical to Cost and Scale)

Relabeling happens in two stages:

Target relabeling (before scrape)

relabel_configs:
  - source_labels: [__address__]
    regex: ".*:9100"
    action: keep

Typical actions: keep, drop, replace, labelmap.

Metric relabeling (after scrape)

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "node_disk_io_time_seconds_total"
    action: drop

This is the last chance to reduce cardinality before storage.

15) Common Relabeling Patterns

Drop noisy metrics:

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "node_network_.*"
    action: drop

Remove high-cardinality labels:

metric_relabel_configs:
  - regex: "pod_uid"
    action: labeldrop

Rename labels:

relabel_configs:
  - source_labels: [instance]
    target_label: host

16) PromQL: Practical Queries

CPU usage:

rate(process_cpu_seconds_total[5m])

Failed jobs:

sum(worker_jobs_total{status="failed"})

Disk I/O by device:

rate(node_disk_io_time_seconds_total[5m])

17) Storage Model (TSDB)

Prometheus stores data locally in time-based blocks. Retention is configurable:

--storage.tsdb.retention.time=15d

18) One-line Mental Model

Prometheus scrapes /metrics, relabels targets and metrics, stores time-series locally, and lets you query everything by labels.

19) Key Takeaways

Exporters expose metrics over HTTP
Prometheus pulls metrics on a schedule
Labels power aggregation and filtering
Relabeling controls cost and scale
Histograms are the default for latency
Service discovery is essential at scale

← Older

Automated Moderation with Consensus and AI Agents

Prometheus: Practical Guide & Mental Model

1) What Prometheus Is (and isn’t)

2) Architecture in One Diagram

3) “Scrape” Means Prometheus Pulls

4) Exporters: Turning State Into Metrics

Node Exporter (Linux)

Windows Exporter

Application & Batch Exporters

Blackbox Exporter (active probing)

Prometheus Targets: Quick Mapping (Color-Coded)

5) Metric Types (What to Use and When)

Counter

Gauge

Histogram (preferred for latency)

Summary (use carefully)

6) Labels: Prometheus’ Core Data Model

7) Prometheus Configuration (prometheus.yml)

Global config

Static scrape configs

Custom scrape example (Node.js app)

8) Blackbox Exporter (Active Probing)

9) Ping Monitoring Architecture (Blackbox / ICMP)

10) Port Monitoring Architecture (TCP Checks)

11) MongoDB Monitoring (Exporter)

12) URL Monitoring (HTTP Checks)

13) Service Discovery (When Static Targets Don’t Scale)

File-based discovery

14) Relabeling (Critical to Cost and Scale)

Target relabeling (before scrape)

Metric relabeling (after scrape)

15) Common Relabeling Patterns

16) PromQL: Practical Queries

17) Storage Model (TSDB)

18) One-line Mental Model

19) Key Takeaways

Support My Content

Ethereum (ETH)