Prometheus: Practical Guide & Mental Model
Prometheus is a pull-based monitoring system and time-series database designed for reliable metrics collection, alerting, and exploration. This guide gives you a compact mental model and the practical pieces you need to operate it at scale.
1) What Prometheus Is (and isn’t)
Prometheus is great for infrastructure metrics, application telemetry, and alerting. It is not a long-term log archive or a general-purpose OLAP system.
Key properties:
- Single static binary (cross-platform)
- Pulls metrics over HTTP
- Stores time-series locally
- Labels as first-class dimensions
- PromQL for queries
2) Architecture in One Diagram
[ Linux / Windows / Apps ]
|
| expose /metrics (HTTP)
v
[ Exporters / Instrumented Apps ]
|
| scrape (HTTP GET)
v
[ Prometheus Server ]
|
| PromQL queries
v
[ Grafana / Alerts ]
3) “Scrape” Means Prometheus Pulls
In Prometheus, a scrape is when Prometheus initiates an HTTP request to a target and pulls metrics.
Concretely:
Prometheus ──HTTP GET──▶ http://target:port/metrics
What that implies:
- The target does nothing proactively
- The target only exposes
/metrics - Prometheus controls when, how often, and how long it waits
Scrape loop (per target):
every scrape_interval:
start timer
GET /metrics
parse text format
store time-series
stop timer
That’s why you see:
scrape_interval: how often Prometheus scrapes a targetscrape_timeout: the max time Prometheus waits for a scrape to finishscrape_duration: how long the last scrape actually took (a measured duration, not a config setting)
Pull vs Push (contrast):
- Prometheus (Pull / Scrape): Prometheus calls you, centralized control, easier debugging, safer at scale
- Push systems: Apps push metrics out, harder governance, more network + retry complexity
Prometheus can accept pushed metrics via Pushgateway, but that’s the exception, not the norm.
Why pull matters operationally:
- Centralized scrape schedules
- Uniform auth and TLS
- Easier service discovery
- Built-in health signal (
up)
If a scrape fails, Prometheus knows immediately. In push systems, silence can look like “everything is fine.”
One sentence to remember:
In Prometheus, a “scrape” is Prometheus pulling metrics over HTTP from a target.
4) Exporters: Turning State Into Metrics
Exporters translate system or app state into Prometheus metrics.
Node Exporter (Linux)
Common metrics:
- CPU:
node_cpu_seconds_total - Memory:
node_memory_MemFree_bytes - Disk:
node_disk_io_time_seconds_total,node_disk_read_bytes_total - Network:
node_network_receive_bytes_total
Example:
node_disk_io_time_seconds_total{device="sda"} 104296
Windows Exporter
Common metrics:
windows_cpu_time_totalwindows_memory_available_byteswindows_logical_disk_free_bytes
Application & Batch Exporters
Libraries exist for Java, Python, Go, and Node.js. Example batch metrics:
process_cpu_seconds_total 5.73
worker_jobs_total{status="processed"} 1570222
worker_jobs_total{status="failed"} 155665
Blackbox Exporter (active probing)
For endpoints that don’t expose /metrics, the Blackbox Exporter performs HTTP/TCP/ICMP/TLS probes and exposes results as scrapeable metrics.
Prometheus Targets: Quick Mapping (Color-Coded)
Key point: Prometheus does NOT auto-detect processes. A running process alone gives Prometheus nothing (Linux or Windows). You must expose metrics or use exporters.
| Need | What Prometheus needs | How to get it | Notes |
|---|---|---|---|
| Node.js app metrics | /metrics endpoint | Instrument app (prom-client) | Prometheus scrapes the app directly |
| Linux host metrics | Host metrics endpoint | Install node-exporter | CPU, RAM, disk, etc. |
| Windows host metrics | Host metrics endpoint | Install windows_exporter | Prometheus can’t see Windows processes by itself |
| Availability / ping / port check | Blackbox probe | blackbox_exporter (ICMP, HTTP, TCP) | Use for URL, ping, port-open checks |
| URL uptime | HTTP probe | blackbox exporter (http_2xx) | Returns success/failure, latency |
| Ping / ICMP | ICMP probe | blackbox exporter (icmp) | Requires ICMP permissions |
| Port open (e.g., 443, 27017) | TCP probe | blackbox exporter (tcp_connect) | Validates port reachability |
| MongoDB metrics | MongoDB metrics endpoint | mongodb_exporter | Not automatic; needs exporter |
| Custom app metrics | /metrics endpoint | Add a custom exporter or instrument the app | Prometheus only scrapes exposed metrics |
— thijs
5) Metric Types (What to Use and When)
Counter
Monotonically increasing; resets on restart.
process_cpu_seconds_totalhttp_requests_total
Gauge
Can go up and down.
node_memory_MemFree_bytesqueue_depth
Histogram (preferred for latency)
Bucketed distribution; aggregatable across instances.
http_request_duration_seconds_buckethttp_request_duration_seconds_sumhttp_request_duration_seconds_count
Summary (use carefully)
Client-side quantiles; not aggregatable across instances.
6) Labels: Prometheus’ Core Data Model
Every metric is a name + labels:
metric_name{label1="value1", label2="value2"} value @ timestamp
Examples:
node_disk_io_time_seconds_total{device="sda", instance="linux-1"} 104296
http_requests_total{method="GET", status="200", service="api"} 982734
Labels give you slicing, aggregation, and multi-dimensional queries. Too many labels = high cardinality, which costs CPU and storage.
7) Prometheus Configuration (prometheus.yml)
Global config
global:
scrape_interval: 10s
Static scrape configs
scrape_configs:
- job_name: "linux"
static_configs:
- targets: ["ip-linux:9100"]
- job_name: "batch"
static_configs:
- targets: ["web-app:8080"]
- job_name: "windows"
static_configs:
- targets: ["win-2019:9182"]
job_name becomes a label automatically, which is useful for grouping.
Custom scrape example (Node.js app)
Definition: a custom scrape is any job you define in scrape_configs for a target that is specific to your environment (a service, exporter, device, or endpoint you decide to monitor), beyond the default examples.
What a custom scrape could represent (examples):
- A Node.js API exposing
/metrics - A Python worker exposing
/metrics - A Go service exposing
/metrics - A Linux node exporter (
:9100) - A database exporter (Postgres, Redis, MySQL)
- A load balancer or proxy exporter (Nginx, HAProxy)
- A message queue exporter (Kafka, RabbitMQ)
- A Kubernetes component (kube-state-metrics, cAdvisor)
- A blackbox probe (HTTP/TCP/ICMP checks)
- A third-party SaaS metrics endpoint
Practical example: your Node.js service exposes http://localhost:3000/metrics (via prom-client or similar). Prometheus will hit that URL every 30s, wait up to 30s for a response, and attach the label env="dev" to all ingested series from that target.
Quick sanity checks:
Sample /metrics output (what Prometheus sees):
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1243
process_cpu_seconds_total 5.73
Verify scrape success in PromQL:
up{job="nodejs-app"}
scrape_configs:
- job_name: "nodejs-app"
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:3000
labels:
env: dev
# auth (choose one)
# no auth (default): nothing to set
# bearer_token: "YOUR_TOKEN"
# bearer_token_file: /etc/prometheus/token
# basic_auth:
# username: "user"
# password: "pass"
# authorization:
# type: Bearer
# credentials: "YOUR_TOKEN"
# tls / https (if needed)
# scheme: https
# tls_config:
# ca_file: /etc/prometheus/ca.pem
# cert_file: /etc/prometheus/client.pem
# key_file: /etc/prometheus/client.key
# insecure_skip_verify: false
# (if true, Prometheus skips TLS certificate verification;
# useful for self-signed certs in dev, but unsafe for prod)
8) Blackbox Exporter (Active Probing)
The Blackbox Exporter lets Prometheus monitor things that don’t expose /metrics themselves (URLs, ports, ICMP ping, TLS checks). Prometheus still scrapes the exporter; the exporter actively probes targets and returns results as metrics.
What it’s used for:
- URL uptime and latency (HTTP/HTTPS)
- TCP port availability
- ICMP ping reachability
- TLS handshake and certificate checks
Minimal wiring model:
Prometheus ──scrape──▶ Blackbox Exporter ──probe──▶ Target
9) Ping Monitoring Architecture (Blackbox / ICMP)
Ping-style monitoring in Prometheus is typically done via the Blackbox Exporter, which probes targets and exposes results for Prometheus to scrape.
High-level flow:
Prometheus ──scrape──▶ Blackbox Exporter
│
└──probe (ICMP/HTTP/TCP)──▶ Target
Key idea: Prometheus still pulls from the exporter; the exporter pushes probes to the target and reports success, latency, and errors as metrics.
Typical setup:
- Run Blackbox Exporter (in the same network as Prometheus or near targets)
- Configure
scrape_configswithmetrics_path: /probe - Pass
moduleandtargetas query params - Query
probe_success,probe_duration_seconds, andprobe_icmp_*
Example config snippet:
scrape_configs:
- job_name: "ping"
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- 1.1.1.1
- 8.8.8.8
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
PromQL checks:
probe_success{job="ping"}
probe_duration_seconds{job="ping"}
Use this for reachability and latency checks across networks; it complements service-level metrics rather than replacing them.
10) Port Monitoring Architecture (TCP Checks)
Port monitoring is also done via Blackbox Exporter, using the tcp module to test if a port is reachable (and optionally perform a simple handshake).
High-level flow:
Prometheus ──scrape──▶ Blackbox Exporter
│
└──probe (TCP connect)──▶ Target:Port
Typical setup:
- Run Blackbox Exporter
- Configure
scrape_configswithmetrics_path: /probe - Set
module: [tcp_connect] - Query
probe_successandprobe_duration_seconds
Example config snippet:
scrape_configs:
- job_name: "ports"
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- 10.0.1.10:22
- 10.0.1.20:5432
- example.com:443
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
PromQL checks:
probe_success{job="ports"}
probe_duration_seconds{job="ports"}
Use this for reachability and basic port availability; it complements service metrics and deeper health checks.
11) MongoDB Monitoring (Exporter)
MongoDB monitoring in Prometheus is typically done via the MongoDB Exporter, which exposes MongoDB stats at /metrics for Prometheus to scrape.
High-level flow:
MongoDB ──stats──▶ MongoDB Exporter ──/metrics──▶ Prometheus
Typical setup:
- Run MongoDB Exporter near your database
- Provide a MongoDB connection URI with a read-only user
- Add a
scrape_configsjob for the exporter - Query key metrics like connections, ops, and replication lag
Example config snippet:
scrape_configs:
- job_name: "mongodb"
static_configs:
- targets: ["mongodb-exporter:9216"]
Common metrics to watch:
mongodb_connections{state="current"}mongodb_op_counters_totalmongodb_mongod_replset_member_statemongodb_replset_lag
Use this for database health, throughput, and replication visibility; pair it with application-level metrics for end-to-end views.
12) URL Monitoring (HTTP Checks)
URL monitoring in Prometheus is usually done via the Blackbox Exporter using the http module to check availability, status codes, and latency.
High-level flow:
Prometheus ──scrape──▶ Blackbox Exporter
│
└──probe (HTTP GET/HEAD)──▶ URL
Typical setup:
- Run Blackbox Exporter
- Configure
scrape_configswithmetrics_path: /probe - Set
module: [http_2xx](or your custom module) - Query
probe_success,probe_http_status_code,probe_duration_seconds
Example config snippet:
scrape_configs:
- job_name: "urls"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://status.example.com/health
- http://internal-api:8080/healthz
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
PromQL checks:
probe_success{job="urls"}
probe_http_status_code{job="urls"}
probe_duration_seconds{job="urls"}
Use this for uptime, HTTP status, and latency checks; pair it with application metrics for deeper diagnostics.
13) Service Discovery (When Static Targets Don’t Scale)
File-based discovery
file_sd_configs:
- files:
- /etc/prometheus/targets/*.json
Example JSON:
[
{
"targets": ["10.0.1.5:9100"],
"labels": {
"env": "prod",
"team": "infra"
}
}
]
Other discovery options:
- DNS / SRV records
- Kubernetes (pod, service, node)
- Cloud providers (AWS, GCP, Azure)
14) Relabeling (Critical to Cost and Scale)
Relabeling happens in two stages:
Target relabeling (before scrape)
relabel_configs:
- source_labels: [__address__]
regex: ".*:9100"
action: keep
Typical actions: keep, drop, replace, labelmap.
Metric relabeling (after scrape)
metric_relabel_configs:
- source_labels: [__name__]
regex: "node_disk_io_time_seconds_total"
action: drop
This is the last chance to reduce cardinality before storage.
15) Common Relabeling Patterns
Drop noisy metrics:
metric_relabel_configs:
- source_labels: [__name__]
regex: "node_network_.*"
action: drop
Remove high-cardinality labels:
metric_relabel_configs:
- regex: "pod_uid"
action: labeldrop
Rename labels:
relabel_configs:
- source_labels: [instance]
target_label: host
16) PromQL: Practical Queries
CPU usage:
rate(process_cpu_seconds_total[5m])
Failed jobs:
sum(worker_jobs_total{status="failed"})
Disk I/O by device:
rate(node_disk_io_time_seconds_total[5m])
17) Storage Model (TSDB)
Prometheus stores data locally in time-based blocks. Retention is configurable:
--storage.tsdb.retention.time=15d
18) One-line Mental Model
Prometheus scrapes /metrics, relabels targets and metrics, stores time-series locally, and lets you query everything by labels.
19) Key Takeaways
- Exporters expose metrics over HTTP
- Prometheus pulls metrics on a schedule
- Labels power aggregation and filtering
- Relabeling controls cost and scale
- Histograms are the default for latency
- Service discovery is essential at scale