Designing metrics that make sense

February 1, 2026 · 4 min read

When an alert goes off at 3AM, with blurry eyes and only partial consciousness, the most important question is not why the alert fired, it is...

where should my attention go first to understand what’s happening?

Is the system down?
Are user requests failing?
Is latency spiking?
Is a critical 3rd party integration unreachable?
Is this a traffic surge or a cascading failure?

Nothing is worse than staring at a dashboard during an incident and wondering if the alert is real or just a side-effect of bad instrumentation.

Prometheus has become the de-facto standard for cloud-native monitoring, it is powerful but also forces you to make architectural choices, Counter or Gauge? Histogram or Summary? Push or Pull? and this dictates whether your dashboards provide clarity or noise during chaos.

Prometheus is designed for:

Time-series monitoring
Event counting & system behavior tracking
Real-time alerting
Observability in distributed systems

It uses a pull model, stores time-series data, and provides a powerful query language called PromQL.

Why Metric Design Matters

Averages can be misleading because they hide outliers and spiky data. An average (mean) cannot distinguish between a system where all users experience 100ms latency and one where half experience 1ms and the other half experience 200ms.

Understanding rates, distributions, and system state is essential for accurate observability.

Prometheus Metric Types: When to Use Each

Prometheus provides four core metric types and it's important to match each type to the question being asked.

Type	Use Case	Example
Counter	Values that only increase	HTTP requests
Gauge	Values that go up & down	Memory usage
Histogram	Request duration & distributions	API latency
Summary	Client-side latency quantiles	Response time percentiles

Counter

Counters are ideal for:

HTTP request counts
Jobs processed
error counts
retry counts
Cache hits/misses

Code

Go
httpRequestsTotal.Inc()

Query

promql

rate(http_requests_total[5m])

Tip: Always use rate() or increase() with counters, raw values rarely provide insight.

Guage

Guages are ideal for:

cpu usage
memory consumption
active connections
queue depth
number of running pods

Code

Go
queueDepth.Set(float64(noOfItems))

Query

promql

myapp_queue_depth

Tip: If a metric never decreases, it should be a counter, but if it goes up and down then it should be a gauge.

Histogram

Histograms capture distributions across buckets, for accurate percentile calculations.

Histograms are ideal for:

request latency
database query duration
payload sizes
job processing time

Code

Go
requestDuration.Observe(duration)

Query

promql
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket[5m])
)

Summaries

Summaries are used to calculate percentiles on the client-side for single-instance applications where pre-defining histogram buckets is impractical.

Summaries are ideal when:

running a single instance
aggregation is not required

Summaries cannot be aggregated across instances, histograms are typically preferred in microservices environments.

Labels

Labels allow metrics to be sliced and analyzed.

promql

http_requests_total{method="POST", status="500"}

Use labels for:

status codes
endpoints
regions
service names

Tip: Avoid using lables for high cardinality data like user IDs or request IDs that could explode into an unbounded number of time-series. It can severely impact performance by overloading the memory and slowing down queries.

Finally, observability is useful when metrics answer operational questions quickly especially under pressure. At 2 AM, the goal is not to admire dashboards. The goal is to understand reality:

What are users experiencing?
What changed?
Where is the bottleneck?
Is this a spike or a trend?

If dashboards feel noisy or alerts lack value, the issue is often design rather than tooling. Choosing the right metric type transforms prometheus from a data collector into a system that provides clarity when it matters the most.

Why Metric Design Matters​

Prometheus Metric Types: When to Use Each​

Counter​

Guage​

Histogram​

Summaries​

Labels​

Why Metric Design Matters

Prometheus Metric Types: When to Use Each

Counter

Guage

Histogram

Summaries

Labels