Cannot trace requests across services: why correlation IDs die at boundaries in .NET

Jan 28, 202610 min read

Share|

Category:.NET

Cannot trace requests across services: why correlation IDs die at boundaries in .NET

A production playbook for when logs exist but cannot be joined—correlation IDs die at HttpClient boundaries, jobs, and queues, making incidents unreproducible.

Free download: Correlation IDs package (HTTP + jobs). Jump to the download section.

The incident is never "logging is bad". The incident is: a customer reports a duplicate charge, a job runs twice, or requests time out, and you cannot answer the basic question: what happened to this specific request.

On-call ends up guessing. Someone restarts IIS, someone retries the job manually, and the system returns to normal until the next repeat. The root cause is not always in code. Sometimes it is in the absence of a traceable narrative.

This post gives you a production playbook for correlation IDs in .NET: one contract, consistent propagation across HTTP and jobs, and log fields that turn a repeat incident into one query.

Rescuing an .NET service in production? Start at the .NET Production Rescue hub and the .NET category.

If you only do three things
  • Define a correlation ID contract (header name, format, propagation rules).
  • Log it everywhere: request start, dependency calls, job starts, and job progress.
  • Propagate through boundaries (HttpClient, queue messages) instead of hoping scopes survive.

Fast triage table (what to check first)

SymptomLikely causeConfirm fastFirst safe move
Logs exist but can’t be joined across servicesMultiple competing IDs (X-Request-Id, X-Correlation-Id, app-generated)In one request, you see different IDs per componentPick one contract ID and propagate it everywhere; log the others as secondary fields
Correlation works in one service but dies downstreamHttpClient doesn’t propagate headersDownstream dependency logs have no correlation_idAdd a DelegatingHandler to copy the correlation header to every outbound request
Jobs can’t be traced back to the requestCorrelation not attached to queue/job messageJob start logs have a new ID with no parentPut correlation ID into message headers/properties (or payload) and create a logging scope at job start
Two “same” incidents look unrelatedCorrelation missing from retry/attempt logsRetry attempts exist but have no join keyLog correlation_id on every attempt + decision, not just request start
Vendor support asks for a request ID you don’t logYou don’t return/log the correlation ID to callersCaller can’t report an ID; response headers missingEcho the correlation ID back in the response header and include it in error responses/logs

Why incidents repeat: logs exist but cannot be joined across services

In a stable system, you can answer:

  • Which request started this workflow?
  • Which downstream calls did it make?
  • Which job run processed it?
  • Where did it stall, retry, or duplicate?

Without a correlation ID, your logs are a pile of facts with no edges. You see 500s and timeouts, but you cannot group them into one narrative. In legacy systems, this is why "restart fixed it" becomes the dominant incident response.

Correlation IDs are not about pretty dashboards. They are about making the next incident smaller.

How to diagnose: can you reconstruct one request path from logs?

  1. Pick one incident and ask: can we reconstruct the request path?

If you cannot list the inbound request, the dependency calls, and the job run that handled it, you do not have end-to-end correlation.

  1. Check whether you have multiple competing IDs.

Common failure:

  • a reverse proxy adds X-Request-Id
  • an API gateway adds X-Correlation-Id
  • your app generates its own RequestId

Now every team is "correlating" but nothing joins.

  1. Check boundary crossings.

Correlation often exists inside one process (scoped logging), but dies at the boundaries:

  • HttpClient calls do not copy headers
  • queue messages do not carry correlation properties
  • background jobs generate new IDs with no parent

If correlation dies at the boundary, your incident narrative will die there too.

Fix: standardize one correlation ID contract and propagate through boundaries

The goal is consistency, not perfection.

1) Define the contract

Pick a single header name and use it everywhere.

  • Header: X-Correlation-Id
  • Format: opaque string (GUID is fine; ULID is better for sortability)
  • Rule: if inbound request has it, keep it. If not, generate it.

Then make the rule explicit: this header is copied to every dependency call and included on every job message.

2) Add inbound middleware (or an equivalent entry hook)

Every request needs to end up with:

  • a correlation ID
  • a logging scope that includes it
  • a response header so callers can report it

In ASP.NET Core, middleware is clean. In classic ASP.NET, this is usually an HttpModule or early pipeline hook.

The important part is not the framework. The important part is that the ID is created once and becomes part of the request context.

3) Propagate through HttpClient

HttpClient does not automatically copy your correlation ID. If you do not add a handler, your downstream logs will be unjoinable.

The safe approach:

  • set the header on outbound requests
  • do not overwrite an existing header (respect caller provided IDs)
  • log dependency calls with correlation ID and route

4) Propagate through queues and jobs

Jobs are where correlation breaks most often.

  • attach correlation ID to message headers/properties
  • include it in the job payload if headers are not supported
  • set a logging scope at job start
  • log progress events with the same ID

If jobs do not log progress with correlation, you will still end up with "job stuck" incidents that cannot be explained.

What to log: turn scattered events into one incident narrative

Log fields are the join keys for incident narratives. At minimum:

  • correlation_id
  • operation (normalized route or job name)
  • component (api, worker, scheduler)
  • dependency (host/vendor)
  • duration_ms
  • outcome (success, timeout, 429, exception, cancelled)
  • attempt (if retries exist)

Example request start log:

json
{
  "event": "request.start",
  "operation": "POST /orders",
  "correlation_id": "01H...",
  "component": "api",
  "remote_ip": "..."
}

Example dependency log:

json
{
  "event": "dependency.call",
  "dependency": "vendor-x",
  "operation": "GET /v2/orders",
  "correlation_id": "01H...",
  "duration_ms": 842,
  "outcome": "429"
}

Example job start log:

json
{
  "event": "job.start",
  "job": "nightly-export",
  "correlation_id": "01H...",
  "component": "worker"
}

Shipped asset

Download
Free

Correlation ID package (HTTP + jobs)

A clear correlation ID contract plus copy/paste middleware and HttpClient handler so correlation survives boundaries.

When to use this (fit check)
  • You have incidents where logs exist but can’t be joined into one request/job narrative.
  • Correlation dies at boundaries (HttpClient calls, queues, job runners).
  • You want a standard contract + copy/paste propagation code for HTTP and jobs.
When NOT to use this (yet)
  • You’re still debating multiple ID names across teams (pick one first, then automate propagation).
  • You can’t add the ID to logs at request start and job start (do that before you optimize anything else).
  • You want distributed tracing only (use this alongside tracing; it’s not a replacement for spans).

What you get (4 files)

  • correlation-id-contract.md
  • AspNetCoreCorrelationIdMiddleware.cs
  • CorrelationIdDelegatingHandler.cs
  • README.md

How to use

  • On call: add correlation ID to the incident report and use it to pull the full request path.
  • Tech lead: standardize the contract and propagate it through HttpClient and job boundaries.
  • CTO: reduce repeated firefighting by making every incident diagnosable.

Resources

Internal:

External:

HttpClient does not automatically copy custom headers. If you do not add a delegating handler that explicitly sets the correlation ID header on outbound requests, the ID dies at the boundary. Use a DelegatingHandler to propagate the header on every call.

Because jobs run outside the HTTP request scope, and most queue systems do not automatically copy message metadata. You need to explicitly attach the correlation ID to the message payload or headers, then create a logging scope at job start. Without this, job logs become isolated facts.

Activity.Current.Id is part of distributed tracing and follows W3C trace context. It works well for spans and telemetry, but breaks across queue boundaries and job runners. Generate an explicit correlation ID for application-level tracing, and log both if you have distributed tracing infrastructure.

Use an HttpModule or early pipeline hook to extract or generate the correlation ID. Store it in HttpContext.Items so handlers and downstream code can access it. Propagate through HttpClient using a custom message handler. The contract is the same, only the entry point differs.

Pick one as the standard and propagate it everywhere. The failure mode is keeping multiple competing IDs—your proxy sends X-Request-Id, your gateway sends X-Correlation-Id, and your app generates its own. Standardize the join key, then log alternative IDs separately if you need them for vendor support.

Because HTTP requests have built-in scopes (HttpContext), but jobs do not. Jobs cross more boundaries (queue serialization, worker process startup, separate logging contexts) and need explicit propagation at each stage. Add correlation to message metadata, extract at job start, and set a logging scope immediately.

Yes. If a customer reports duplicate charges or orders, search logs by correlation ID to see if the same ID appears twice (duplicate processing) or if two different IDs exist (separate requests). This reveals whether the issue is idempotency failure, retry logic, or user action.

FAQ

Not exactly. A correlation ID is an application level join key that you control. A trace ID is part of distributed tracing and usually comes from W3C trace context (traceparent). In practice you can log both: keep an explicit correlation_id for humans and incident reports, and log trace_id when you have tracing.

Pick one as the contract and propagate it. If your proxy ID is stable and available everywhere, you can adopt it. The failure mode is keeping both and letting services pick different ones. Standardize the join key, then log the proxy request ID separately if you still want it.

No. Correlation IDs are useful even with plain logs. OpenTelemetry helps when you want spans, timing, and sampling across services. The correlation ID contract is still valuable because it works across jobs and message boundaries where traces are often incomplete.

Because jobs cross more boundaries and run outside a request scope. A request scope can carry context automatically, but a job needs explicit propagation through message metadata and explicit logging scope creation at job start. If you do not do that, job logs become isolated facts.

Pick one API and one worker. Add inbound correlation, outbound HttpClient propagation, and job start scopes. Then validate that one request produces a connected chain of logs across components. Once that works, roll it out to the rest of the estate.

Checklist (copy/paste)

  • One correlation ID contract exists: header name, format, and propagation rules.
  • Inbound requests extract-or-generate the ID and return it in the response header.
  • A logging scope includes correlation_id for the full request lifecycle.
  • HttpClient propagation exists (a DelegatingHandler copies the header to outbound calls).
  • Dependency logs include correlation_id, dependency, operation/route, duration, outcome.
  • Queue/job messages carry the ID (headers/properties preferred; payload fallback).
  • Workers create a logging scope at job start and keep the same ID for progress logs.
  • Competing IDs (proxy/gateway IDs) are logged as secondary fields, not as the join key.
  • One “golden path” trace is tested: request → downstream call → queue message → job run, all joinable by one ID.

Coming soon

Axiom is where we keep consistent operational defaults for production systems: logging schemas, incident runbooks, and safe reliability patterns you can reuse.

Coming soon

Axiom (Coming Soon)

Get notified when we ship real operational assets (runbooks, templates, schemas), not generic tutorials.

Key takeaways

  • Correlation IDs are a join key for incident narratives.
  • Pick one contract, propagate it through boundaries, and log it consistently.
  • The value is measured in faster incident resolution and fewer repeat incidents, not in prettier dashboards.

Recommended resources

Download the shipped checklist/templates for this post.

A correlation ID contract plus copy/paste ASP.NET Core middleware and an HttpClient handler so correlation survives HTTP and background job boundaries and incidents become one-query diagnosable.

resource

Related posts