Jan 31, 2026

Prometheus: Practical Guide & Mental Model

Prometheus is a pull-based monitoring system and time-series database designed for reliable metrics collection, alerting, and exploration. This guide gives you a compact mental model and the practical pieces you need to operate it at scale.

1) What Prometheus Is (and isn’t)

Prometheus is great for infrastructure metrics, application telemetry, and alerting. It is not a long-term log archive or a general-purpose OLAP system.

Key properties:

  • Single static binary (cross-platform)
  • Pulls metrics over HTTP
  • Stores time-series locally
  • Labels as first-class dimensions
  • PromQL for queries

2) Architecture in One Diagram

[ Linux / Windows / Apps ]
          |
          |  expose /metrics (HTTP)
          v
[ Exporters / Instrumented Apps ]
          |
          |  scrape (HTTP GET)
          v
[ Prometheus Server ]
          |
          |  PromQL queries
          v
[ Grafana / Alerts ]

3) “Scrape” Means Prometheus Pulls

In Prometheus, a scrape is when Prometheus initiates an HTTP request to a target and pulls metrics.

Concretely:

Prometheus  ──HTTP GET──▶  http://target:port/metrics

What that implies:

  • The target does nothing proactively
  • The target only exposes /metrics
  • Prometheus controls when, how often, and how long it waits

Scrape loop (per target):

every scrape_interval:
  start timer
  GET /metrics
  parse text format
  store time-series
  stop timer

That’s why you see:

  • scrape_interval: how often Prometheus scrapes a target
  • scrape_timeout: the max time Prometheus waits for a scrape to finish
  • scrape_duration: how long the last scrape actually took (a measured duration, not a config setting)

Pull vs Push (contrast):

  • Prometheus (Pull / Scrape): Prometheus calls you, centralized control, easier debugging, safer at scale
  • Push systems: Apps push metrics out, harder governance, more network + retry complexity

Prometheus can accept pushed metrics via Pushgateway, but that’s the exception, not the norm.

Why pull matters operationally:

  • Centralized scrape schedules
  • Uniform auth and TLS
  • Easier service discovery
  • Built-in health signal (up)

If a scrape fails, Prometheus knows immediately. In push systems, silence can look like “everything is fine.”

One sentence to remember:

In Prometheus, a “scrape” is Prometheus pulling metrics over HTTP from a target.

4) Exporters: Turning State Into Metrics

Exporters translate system or app state into Prometheus metrics.

Node Exporter (Linux)

Common metrics:

  • CPU: node_cpu_seconds_total
  • Memory: node_memory_MemFree_bytes
  • Disk: node_disk_io_time_seconds_total, node_disk_read_bytes_total
  • Network: node_network_receive_bytes_total

Example:

node_disk_io_time_seconds_total{device="sda"} 104296

Windows Exporter

Common metrics:

  • windows_cpu_time_total
  • windows_memory_available_bytes
  • windows_logical_disk_free_bytes

Application & Batch Exporters

Libraries exist for Java, Python, Go, and Node.js. Example batch metrics:

process_cpu_seconds_total 5.73
worker_jobs_total{status="processed"} 1570222
worker_jobs_total{status="failed"} 155665

Blackbox Exporter (active probing)

For endpoints that don’t expose /metrics, the Blackbox Exporter performs HTTP/TCP/ICMP/TLS probes and exposes results as scrapeable metrics.

Prometheus Targets: Quick Mapping (Color-Coded)

Key point: Prometheus does NOT auto-detect processes. A running process alone gives Prometheus nothing (Linux or Windows). You must expose metrics or use exporters.

NeedWhat Prometheus needsHow to get itNotes
Node.js app metrics/metrics endpointInstrument app (prom-client)Prometheus scrapes the app directly
Linux host metricsHost metrics endpointInstall node-exporterCPU, RAM, disk, etc.
Windows host metricsHost metrics endpointInstall windows_exporterPrometheus can’t see Windows processes by itself
Availability / ping / port checkBlackbox probeblackbox_exporter (ICMP, HTTP, TCP)Use for URL, ping, port-open checks
URL uptimeHTTP probeblackbox exporter (http_2xx)Returns success/failure, latency
Ping / ICMPICMP probeblackbox exporter (icmp)Requires ICMP permissions
Port open (e.g., 443, 27017)TCP probeblackbox exporter (tcp_connect)Validates port reachability
MongoDB metricsMongoDB metrics endpointmongodb_exporterNot automatic; needs exporter
Custom app metrics/metrics endpointAdd a custom exporter or instrument the appPrometheus only scrapes exposed metrics

— thijs

5) Metric Types (What to Use and When)

Counter

Monotonically increasing; resets on restart.

  • process_cpu_seconds_total
  • http_requests_total

Gauge

Can go up and down.

  • node_memory_MemFree_bytes
  • queue_depth

Histogram (preferred for latency)

Bucketed distribution; aggregatable across instances.

  • http_request_duration_seconds_bucket
  • http_request_duration_seconds_sum
  • http_request_duration_seconds_count

Summary (use carefully)

Client-side quantiles; not aggregatable across instances.

6) Labels: Prometheus’ Core Data Model

Every metric is a name + labels:

metric_name{label1="value1", label2="value2"} value @ timestamp

Examples:

node_disk_io_time_seconds_total{device="sda", instance="linux-1"} 104296
http_requests_total{method="GET", status="200", service="api"} 982734

Labels give you slicing, aggregation, and multi-dimensional queries. Too many labels = high cardinality, which costs CPU and storage.

7) Prometheus Configuration (prometheus.yml)

Global config

global:
  scrape_interval: 10s

Static scrape configs

scrape_configs:
  - job_name: "linux"
    static_configs:
      - targets: ["ip-linux:9100"]

  - job_name: "batch"
    static_configs:
      - targets: ["web-app:8080"]

  - job_name: "windows"
    static_configs:
      - targets: ["win-2019:9182"]

job_name becomes a label automatically, which is useful for grouping.

Custom scrape example (Node.js app)

Definition: a custom scrape is any job you define in scrape_configs for a target that is specific to your environment (a service, exporter, device, or endpoint you decide to monitor), beyond the default examples.

What a custom scrape could represent (examples):

  • A Node.js API exposing /metrics
  • A Python worker exposing /metrics
  • A Go service exposing /metrics
  • A Linux node exporter (:9100)
  • A database exporter (Postgres, Redis, MySQL)
  • A load balancer or proxy exporter (Nginx, HAProxy)
  • A message queue exporter (Kafka, RabbitMQ)
  • A Kubernetes component (kube-state-metrics, cAdvisor)
  • A blackbox probe (HTTP/TCP/ICMP checks)
  • A third-party SaaS metrics endpoint

Practical example: your Node.js service exposes http://localhost:3000/metrics (via prom-client or similar). Prometheus will hit that URL every 30s, wait up to 30s for a response, and attach the label env="dev" to all ingested series from that target.

Quick sanity checks:

Sample /metrics output (what Prometheus sees):

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1243
process_cpu_seconds_total 5.73

Verify scrape success in PromQL:

up{job="nodejs-app"}
scrape_configs:
  - job_name: "nodejs-app"
    scrape_interval: 30s
    scrape_timeout: 30s
    metrics_path: /metrics
    scheme: http
    static_configs:
      - targets:
          - localhost:3000
        labels:
          env: dev

    # auth (choose one)
    # no auth (default): nothing to set
    # bearer_token: "YOUR_TOKEN"
    # bearer_token_file: /etc/prometheus/token
    # basic_auth:
    #   username: "user"
    #   password: "pass"
    # authorization:
    #   type: Bearer
    #   credentials: "YOUR_TOKEN"

    # tls / https (if needed)
    # scheme: https
    # tls_config:
    #   ca_file: /etc/prometheus/ca.pem
    #   cert_file: /etc/prometheus/client.pem
    #   key_file: /etc/prometheus/client.key
    #   insecure_skip_verify: false
    #   (if true, Prometheus skips TLS certificate verification;
    #    useful for self-signed certs in dev, but unsafe for prod)

8) Blackbox Exporter (Active Probing)

The Blackbox Exporter lets Prometheus monitor things that don’t expose /metrics themselves (URLs, ports, ICMP ping, TLS checks). Prometheus still scrapes the exporter; the exporter actively probes targets and returns results as metrics.

What it’s used for:

  • URL uptime and latency (HTTP/HTTPS)
  • TCP port availability
  • ICMP ping reachability
  • TLS handshake and certificate checks

Minimal wiring model:

Prometheus ──scrape──▶ Blackbox Exporter ──probe──▶ Target

9) Ping Monitoring Architecture (Blackbox / ICMP)

Ping-style monitoring in Prometheus is typically done via the Blackbox Exporter, which probes targets and exposes results for Prometheus to scrape.

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (ICMP/HTTP/TCP)──▶ Target

Key idea: Prometheus still pulls from the exporter; the exporter pushes probes to the target and reports success, latency, and errors as metrics.

Typical setup:

  1. Run Blackbox Exporter (in the same network as Prometheus or near targets)
  2. Configure scrape_configs with metrics_path: /probe
  3. Pass module and target as query params
  4. Query probe_success, probe_duration_seconds, and probe_icmp_*

Example config snippet:

scrape_configs:
  - job_name: "ping"
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
          - 1.1.1.1
          - 8.8.8.8
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="ping"}
probe_duration_seconds{job="ping"}

Use this for reachability and latency checks across networks; it complements service-level metrics rather than replacing them.

10) Port Monitoring Architecture (TCP Checks)

Port monitoring is also done via Blackbox Exporter, using the tcp module to test if a port is reachable (and optionally perform a simple handshake).

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (TCP connect)──▶ Target:Port

Typical setup:

  1. Run Blackbox Exporter
  2. Configure scrape_configs with metrics_path: /probe
  3. Set module: [tcp_connect]
  4. Query probe_success and probe_duration_seconds

Example config snippet:

scrape_configs:
  - job_name: "ports"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - 10.0.1.10:22
          - 10.0.1.20:5432
          - example.com:443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="ports"}
probe_duration_seconds{job="ports"}

Use this for reachability and basic port availability; it complements service metrics and deeper health checks.

11) MongoDB Monitoring (Exporter)

MongoDB monitoring in Prometheus is typically done via the MongoDB Exporter, which exposes MongoDB stats at /metrics for Prometheus to scrape.

High-level flow:

MongoDB ──stats──▶ MongoDB Exporter ──/metrics──▶ Prometheus

Typical setup:

  1. Run MongoDB Exporter near your database
  2. Provide a MongoDB connection URI with a read-only user
  3. Add a scrape_configs job for the exporter
  4. Query key metrics like connections, ops, and replication lag

Example config snippet:

scrape_configs:
  - job_name: "mongodb"
    static_configs:
      - targets: ["mongodb-exporter:9216"]

Common metrics to watch:

  • mongodb_connections{state="current"}
  • mongodb_op_counters_total
  • mongodb_mongod_replset_member_state
  • mongodb_replset_lag

Use this for database health, throughput, and replication visibility; pair it with application-level metrics for end-to-end views.

12) URL Monitoring (HTTP Checks)

URL monitoring in Prometheus is usually done via the Blackbox Exporter using the http module to check availability, status codes, and latency.

High-level flow:

Prometheus ──scrape──▶ Blackbox Exporter
                 │
                 └──probe (HTTP GET/HEAD)──▶ URL

Typical setup:

  1. Run Blackbox Exporter
  2. Configure scrape_configs with metrics_path: /probe
  3. Set module: [http_2xx] (or your custom module)
  4. Query probe_success, probe_http_status_code, probe_duration_seconds

Example config snippet:

scrape_configs:
  - job_name: "urls"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com
          - https://status.example.com/health
          - http://internal-api:8080/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

PromQL checks:

probe_success{job="urls"}
probe_http_status_code{job="urls"}
probe_duration_seconds{job="urls"}

Use this for uptime, HTTP status, and latency checks; pair it with application metrics for deeper diagnostics.

13) Service Discovery (When Static Targets Don’t Scale)

File-based discovery

file_sd_configs:
  - files:
      - /etc/prometheus/targets/*.json

Example JSON:

[
  {
    "targets": ["10.0.1.5:9100"],
    "labels": {
      "env": "prod",
      "team": "infra"
    }
  }
]

Other discovery options:

  • DNS / SRV records
  • Kubernetes (pod, service, node)
  • Cloud providers (AWS, GCP, Azure)

14) Relabeling (Critical to Cost and Scale)

Relabeling happens in two stages:

Target relabeling (before scrape)

relabel_configs:
  - source_labels: [__address__]
    regex: ".*:9100"
    action: keep

Typical actions: keep, drop, replace, labelmap.

Metric relabeling (after scrape)

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "node_disk_io_time_seconds_total"
    action: drop

This is the last chance to reduce cardinality before storage.

15) Common Relabeling Patterns

Drop noisy metrics:

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "node_network_.*"
    action: drop

Remove high-cardinality labels:

metric_relabel_configs:
  - regex: "pod_uid"
    action: labeldrop

Rename labels:

relabel_configs:
  - source_labels: [instance]
    target_label: host

16) PromQL: Practical Queries

CPU usage:

rate(process_cpu_seconds_total[5m])

Failed jobs:

sum(worker_jobs_total{status="failed"})

Disk I/O by device:

rate(node_disk_io_time_seconds_total[5m])

17) Storage Model (TSDB)

Prometheus stores data locally in time-based blocks. Retention is configurable:

--storage.tsdb.retention.time=15d

18) One-line Mental Model

Prometheus scrapes /metrics, relabels targets and metrics, stores time-series locally, and lets you query everything by labels.

19) Key Takeaways

  • Exporters expose metrics over HTTP
  • Prometheus pulls metrics on a schedule
  • Labels power aggregation and filtering
  • Relabeling controls cost and scale
  • Histograms are the default for latency
  • Service discovery is essential at scale

Thanks for reading! If you want to see future content, you can follow me on Twitter or get connected over at LinkedIn.


Support My Content

If you find my content helpful, consider supporting a humanitarian cause (building homes for elderly people in rural Terai region of Nepal) that I am planning with your donation:

Ethereum (ETH)

0xB62409A5B227D2aE7D8C66fdaA5EEf4eB4E37959

Thank you for your support!