May 2, 2026

Instrumenting Cloudflare Workers With Counters & Histograms

Cloudflare Workers don’t ship with application-level counters or histograms out of the box, so you need to instrument them yourself. The best path for most teams is writing directly to Workers Analytics Engine (WAE), which accepts non-blocking writes and supports SQL-based histogram bucketing. If you already run Prometheus, an exporter that polls Cloudflare’s GraphQL API is the cleanest integration. OpenTelemetry metrics export from Workers is not yet supported as of April 2026, so plan around that gap.

What Counters and Histograms Actually Are

Before wiring anything into your Worker, get the definitions straight. These two metric types cover the vast majority of what you need to observe in a production service.

A counter is a number that only goes up. It resets when the process restarts. You use counters for things like total API requests, bytes transferred, or error counts. The useful signal comes from computing the rate of change over time, not from the raw value itself. Prometheus instrumentation best practices are the canonical reference here.

A histogram tracks the distribution of values by sorting observations into buckets. Each bucket counts how many observations fell at or below a threshold. A histogram also tracks the total count and sum of all observations. This lets you compute percentiles and answer questions like “what percentage of requests finished under 300ms?”

Why histograms over summaries? Summaries precompute quantiles on the client side, which means you cannot aggregate them across multiple replicas or data centers. Histograms can be aggregated freely, which matters a lot in distributed systems like Workers running across Cloudflare’s edge. Prometheus recommends histograms for exactly this reason.

A gauge (for completeness) can go up or down and represents a current value, like active connections or queue depth. Workers’ short-lived nature makes gauges less useful here; counters and histograms are the workhorses.

What Cloudflare Gives You vs. What You Must Build

Cloudflare’s dashboard already tracks several Worker-level metrics: total requests, subrequests, CPU time, wall time, invocation statuses, and (with Smart Placement) a request duration histogram. Retention goes up to three months. Wall time includes time spent in waitUntil handlers, which is worth knowing when you’re debugging unexpectedly high numbers. Cloudflare’s metrics documentation covers all of this.

These built-in metrics tell you about runtime health. They do not tell you about your application’s business logic. If you want to count checkout completions, measure API latency per route, or track error rates by customer tier, you need custom instrumentation.

The primary path for custom metrics is Workers Analytics Engine (WAE). You bind a dataset, call writeDataPoint, and query results with SQL. Writes are non-blocking and don’t add latency to your response. The WAE documentation describes it as designed for high-cardinality time-series data.

What about OpenTelemetry? Cloudflare supports exporting logs and traces via OTel, but metrics export is explicitly not supported yet. The OTel export documentation confirmed this as of April 2026. If your plan depends on OTLP metrics from Workers, you’ll need a different approach today.

Three Patterns That Actually Work for Instrumenting Workers

Here’s a decision tree. Pick the pattern that matches your situation.

Pattern A: Write Directly to Workers Analytics Engine

This is the recommended approach for most teams instrumenting Cloudflare Workers with counters and histograms. WAE is serverless-native, handles high-cardinality data well, and its writes are non-blocking, meaning you don’t even need ctx.waitUntil().

How it works: Bind an Analytics Engine dataset in your wrangler.toml, then call writeDataPoint in your fetch handler. You pass blobs (string dimensions), doubles (numeric values), and an index (a sampling key).

For a counter, you write a 1 as the double value on every event. For a histogram, you write the raw measurement (like response time in milliseconds) as the double. The bucketing happens at query time in SQL, not at write time. This is actually more flexible than traditional histogram instrumentation because you can change bucket boundaries without redeploying.

Counter example (counting API hits by route and status):

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);
    const route = `${request.method} ${url.pathname}`;
    const response = await handleRequest(request);

    env.ANALYTICS.writeDataPoint({
      blobs: ["api_hits", route, String(response.status)],
      doubles: [1],
      indexes: [url.hostname],
    });

    return response;
  },
};

This pattern follows the official WAE write example.

Histogram example (recording latency):

export default {
  async fetch(request, env, ctx) {
    const start = Date.now();
    const url = new URL(request.url);
    const route = `${request.method} ${url.pathname}`;
    const response = await handleRequest(request);
    const durationMs = Date.now() - start;

    env.ANALYTICS.writeDataPoint({
      blobs: ["request_duration", route, String(response.status)],
      doubles: [durationMs],
      indexes: [url.hostname],
    });

    return response;
  },
};

No waitUntil, no background tasks, no risk of cancellation. The write fires and the response goes out immediately.

For teams that want a code-to-dashboard path for counters and histograms without wiring up SQL queries or building visualization layers, a hosted metrics service can fill that gap.

Pattern B: Tail Worker Aggregation

If you want to keep your main Worker as thin as possible, use a Tail Worker. Tail Workers run after each invocation of your primary Worker and receive the logs it emitted. You can aggregate events (count errors by endpoint, compute latency distributions) inside the Tail Worker and write the aggregated results to WAE or an external system.

This pattern decouples instrumentation from the request path entirely. The Tail Worker is billed by CPU time, not by requests, and it can’t slow down your responses. Cloudflare’s Tail Worker documentation details the setup.

Teams on community forums report using Tail Workers to batch metrics and ship them to external systems. The main advantage is centralized aggregation logic that doesn’t pollute your application code.

Pattern C: Prometheus via an Exporter

If you already run Prometheus and Grafana, you probably want your Workers metrics in that stack. Workers don’t expose a /metrics endpoint for Prometheus to scrape, so the community has built exporters that poll Cloudflare’s GraphQL analytics API and convert the results to Prometheus text format.

A CNCF post on monitoring Cloudflare Workers with Prometheus confirms this exporter pattern is the standard approach. Practitioners report it works well when you need Grafana dashboards and Prometheus alerting rules from an existing stack.

What about Pushgateway? The Prometheus project strongly discourages using Pushgateway for general service metrics. It’s designed for short-lived batch jobs, not ongoing services. Workers might feel ephemeral, but Pushgateway’s semantics (last-pushed value sticks around until manually deleted) create confusing results for counters and histograms that should reflect live traffic.

If your team isn’t locked into Prometheus, consider whether a lightweight JavaScript metrics client with built-in flush semantics might be simpler than maintaining an exporter.

Flush Semantics: ctx.waitUntil() and Its Limits

When instrumenting Cloudflare Workers with counters and histograms via external APIs, you need to send data somewhere after computing the metric. You don’t want that HTTP call blocking your response. That’s where ctx.waitUntil() comes in.

ctx.waitUntil(promise) extends the Worker’s lifetime past the response. The runtime will keep the isolate alive to finish your background work, up to a hard 30-second cap. All waitUntil tasks share that 30-second budget. If time runs out, tasks get canceled. Cloudflare’s context API docs spell out these constraints.

A few rules:

Don’t destructure ctx. Passing waitUntil as a standalone function loses the binding and throws. Always call ctx.waitUntil() directly.
Wall time in the dashboard includes waitUntil execution. If your background flush takes 2 seconds, your wall time metric shows 2 extra seconds even though the user got their response immediately. This is documented behavior.
If delivery must be guaranteed, use Queues. waitUntil is best-effort. For metrics you absolutely cannot lose, write to a Cloudflare Queue and process it out-of-band.

Here’s a generic pattern for flushing to an external metrics API:

export default {
  async fetch(request, env, ctx) {
    const start = Date.now();
    const response = await handleRequest(request);
    const durationMs = Date.now() - start;

    ctx.waitUntil(
      fetch("https://metrics.example.com/ingest", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "Authorization": "Bearer YOUR_TOKEN",
        },
        body: JSON.stringify({
          counters: [{ name: "api_hits", value: 1, labels: { route: "/items" } }],
          histograms: [{ name: "request_duration_ms", value: durationMs }],
        }),
      })
    );

    return response;
  },
};

Keep payloads small. Batch multiple increments into one POST when possible. And remember: WAE writes are non-blocking by design, so you skip this complexity entirely if you use Pattern A.

The Cloudflare Workers quickstart for Distlang Metrics shows a concrete example of flush-based instrumentation with a single API token, if you want a managed alternative to rolling your own ingestion endpoint.

How to Choose Histogram Buckets

Bucket selection determines the accuracy of any percentile you compute from your histogram. Bad boundaries produce misleading results.

Tie your buckets to your SLOs. If your SLA promises 95% of requests under 300ms, you need precise resolution around that boundary. A bucket set like [50, 100, 200, 300, 400, 800, 1200] in milliseconds gives you good visibility into whether you’re meeting or missing that target.

General guidance from the Prometheus histogram documentation:

Use 5 to 10 buckets for most use cases.
Place boundaries where decisions happen (SLO thresholds, timeout values).
Exponential bucket schemes (10, 25, 50, 100, 250, 500, 1000) work well as a default when you don’t have specific SLOs yet.
Native histograms, which use exponential bucketing automatically, are preferable when your stack supports them. WAE doesn’t use native histograms, but its SQL bucketing gives you equivalent flexibility.

In WAE, buckets are defined at query time using SQL CASE expressions. This means you can experiment with different boundaries without touching your Worker code. Here’s a query that produces cumulative bucket counts (matching Prometheus histogram semantics):

SELECT
  DATE_TRUNC('minute', timestamp) AS ts_min,
  blob2 AS route,
  SUM(CASE WHEN double1 <= 50  THEN _sample_interval ELSE 0 END) AS le_50,
  SUM(CASE WHEN double1 <= 100 THEN _sample_interval ELSE 0 END) AS le_100,
  SUM(CASE WHEN double1 <= 200 THEN _sample_interval ELSE 0 END) AS le_200,
  SUM(CASE WHEN double1 <= 300 THEN _sample_interval ELSE 0 END) AS le_300,
  SUM(CASE WHEN double1 <= 400 THEN _sample_interval ELSE 0 END) AS le_400,
  SUM(CASE WHEN double1 <= 800 THEN _sample_interval ELSE 0 END) AS le_800,
  SUM(_sample_interval) AS count_total
FROM request_metrics
WHERE timestamp > NOW() - INTERVAL '1 hour'
  AND blob1 = 'request_duration'
GROUP BY ts_min, route
ORDER BY ts_min DESC;

The _sample_interval field is WAE’s sampling metadata that scales counts correctly. The Analytics Engine documentation explains this mechanism.

From this output, you can calculate the percentage of requests under your SLO threshold directly: le_300 / count_total * 100.

Labels Without Regret

High-cardinality labels are the fastest way to make your metrics system expensive or slow. This applies whether you’re using WAE, Prometheus, or anything else.

The label budget for Workers: Stick to 2 or 3 labels on hot paths. Good labels include route (normalized, like GET /items/:id), status_class (2xx, 4xx, 5xx), and maybe region or method. Bad labels include raw URL paths with query strings, user IDs, request IDs, or API keys. Prometheus instrumentation guidelines emphasize treating labels carefully to avoid explosive time series counts.

WAE handles the high-cardinality case better than Prometheus because it stores data points individually rather than maintaining time series per label combination. If you need per-customer or per-API-key analysis, put those values in WAE blobs and query them with SQL. Don’t make them labels in a Prometheus-style system.

Do this: blobs: ["api_hits", "GET /items", "200"]

Not this: blobs: ["api_hits", "/items?user=abc123&page=7", "200"]

Normalize your routes before writing them. A helper function that strips IDs and query parameters saves you from cardinality explosions later.

Troubleshooting and Verification

Once you’ve added instrumentation, you need to verify it’s working before trusting the data.

Use Workers Logs with head-based sampling. Set sampling to 100% in staging while validating your instrumentation. In production, start at 1% and widen during incidents. Cloudflare’s Workers Logs documentation covers sampling configuration. This approach controls costs while giving you visibility.

Compare wall time to CPU time. If you’ve added background work via waitUntil, your wall time in the Cloudflare dashboard will increase even though response latency stays the same. A big gap between CPU time and wall time usually means your background tasks (metric flushes, external API calls) are waiting on network I/O.

Verify WAE writes with a simple SQL query. After deploying your instrumentation, query the dataset immediately:

SELECT COUNT(*) as total_points, MIN(timestamp) as first, MAX(timestamp) as last
FROM your_dataset
WHERE timestamp > NOW() - INTERVAL '5 minutes';

If this returns zero rows, check your wrangler.toml binding and make sure the dataset name matches.

For a broader look at how developers approach observability on Cloudflare-first platforms, it helps to understand the constraints that shaped these patterns.

Decision Tree: Picking Your Path

Want fast, high-cardinality app metrics? Use WAE. Write doubles for counters (value of 1) and raw durations for histograms. Do SQL bucketing at query time. No extra infrastructure needed.

Need Prometheus to see your Workers metrics? Run an exporter that polls Cloudflare’s GraphQL API. Only use Pushgateway for genuinely ephemeral batch jobs, not for ongoing Worker traffic.

Need zero added response latency? WAE’s non-blocking writes are the safest choice. If you must call an external API, use ctx.waitUntil() with small payloads, and accept the 30-second cap.

Want a hosted dashboard without managing WAE queries or Prometheus? An API-first metrics service with a JavaScript client for counters and histograms can get you from code to dashboard in minutes instead of hours.

Planning to use OpenTelemetry metrics? Wait. Cloudflare supports OTel logs and traces, but not metrics export yet. Build on WAE or an external API today and migrate when OTel metrics support ships.

Beyond Metrics: Building Observable Workers

Instrumenting Cloudflare Workers with counters and histograms is the foundation, but it’s not the whole picture. Counters tell you what happened. Histograms tell you how long things took. Combine them with structured logs and traces (both supported via OTel export) for full observability.

The patterns in this guide, writing to WAE, using Tail Workers for aggregation, and flushing via waitUntil, apply beyond just counters and histograms. They’re the same primitives you’ll use for any custom telemetry in Workers’ short-lived runtime.

For teams that want to ship metrics quickly without assembling a pipeline from scratch, Distlang Metrics provides an API-first serverless metrics service with a JavaScript client, built-in flush and buffering semantics for Workers, and a hosted dashboard. The free tier includes 500k rows per month, which is enough to validate your instrumentation approach before scaling up.

If you’re exploring how reusable capabilities like metrics, storage, and auth can simplify distributed apps, the helpers and layers model behind Distlang is worth a look.

Frequently Asked Questions

Can I use OpenTelemetry to export counters and histograms from Cloudflare Workers?

Not yet. Cloudflare supports exporting logs and traces via OpenTelemetry, but metrics export is explicitly unsupported as of April 2026. Use Workers Analytics Engine or an external metrics API for now.

Do I need ctx.waitUntil() to write metrics to Workers Analytics Engine?

No. WAE writes are non-blocking, so they complete without extending the Worker’s lifetime. The WAE documentation confirms this. You only need ctx.waitUntil() if you’re making HTTP calls to external metrics services.

What happens if my waitUntil task exceeds 30 seconds?

It gets canceled. All waitUntil tasks share a 30-second budget after the response is sent. If you need guaranteed delivery of metrics, write to a Cloudflare Queue instead and process the data asynchronously.

How many labels should I use on my Workers metrics?

Two to three on hot paths. Use normalized values like route pattern and status class. Avoid unbounded labels like raw URLs or user IDs. If you need high-cardinality analysis, WAE’s blob fields and SQL queries handle that better than label-based time series.

Should I use Prometheus Pushgateway for Workers metrics?

Probably not. The Prometheus project discourages Pushgateway for general service metrics. It’s designed for short-lived batch jobs. For Workers, use an exporter that polls Cloudflare’s GraphQL API, or write to WAE and query with SQL.

How do I choose histogram bucket boundaries for latency metrics?

Start with your SLO thresholds. If you need 95% of requests under 300ms, place explicit boundaries around that value (e.g., 50, 100, 200, 300, 400, 800, 1200ms). In WAE, you define buckets at query time in SQL, so you can adjust boundaries without redeploying your Worker.

Does wall time in the Cloudflare dashboard include background work?

Yes. Wall time per execution includes time spent in waitUntil handlers. This means metric flushes and background API calls show up in your wall time even though the user received their response much earlier.

What’s the difference between using WAE directly and using a Tail Worker?

Direct WAE writes are simpler: one line of code in your fetch handler. Tail Workers let you separate instrumentation logic from your application, aggregate events before writing, and avoid any coupling between metrics and response latency. Use Tail Workers when your instrumentation logic is complex or when you want a centralized aggregation layer.