Jan 28, 202610 min read

Share |

Category:.NET

Cannot trace requests across services: why correlation IDs die at boundaries in .NET

A production playbook for when logs exist but cannot be joined—correlation IDs die at HttpClient boundaries, jobs, and queues, making incidents unreproducible.

Free download: Correlation IDs package (HTTP + jobs). Jump to the download section.

The incident is never "logging is bad". The incident is: a customer reports a duplicate charge, a job runs twice, or requests time out, and you cannot answer the basic question: what happened to this specific request.

On-call ends up guessing. Someone restarts IIS, someone retries the job manually, and the system returns to normal until the next repeat. The root cause is not always in code. Sometimes it is in the absence of a traceable narrative.

This post gives you a production playbook for correlation IDs in .NET: one contract, consistent propagation across HTTP and jobs, and log fields that turn a repeat incident into one query.

Rescuing an .NET service in production? Start at the .NET Production Rescue hub and the .NET category.

If you only do three things

Define a correlation ID contract (header name, format, propagation rules).
Log it everywhere: request start, dependency calls, job starts, and job progress.
Propagate through boundaries (HttpClient, queue messages) instead of hoping scopes survive.

Fast triage table (what to check first)

Symptom	Likely cause	Confirm fast	First safe move
Logs exist but can’t be joined across services	Multiple competing IDs (`X-Request-Id`, `X-Correlation-Id`, app-generated)	In one request, you see different IDs per component	Pick one contract ID and propagate it everywhere; log the others as secondary fields
Correlation works in one service but dies downstream	HttpClient doesn’t propagate headers	Downstream dependency logs have no `correlation_id`	Add a `DelegatingHandler` to copy the correlation header to every outbound request
Jobs can’t be traced back to the request	Correlation not attached to queue/job message	Job start logs have a new ID with no parent	Put correlation ID into message headers/properties (or payload) and create a logging scope at job start
Two “same” incidents look unrelated	Correlation missing from retry/attempt logs	Retry attempts exist but have no join key	Log `correlation_id` on every attempt + decision, not just request start
Vendor support asks for a request ID you don’t log	You don’t return/log the correlation ID to callers	Caller can’t report an ID; response headers missing	Echo the correlation ID back in the response header and include it in error responses/logs

Why incidents repeat: logs exist but cannot be joined across services

In a stable system, you can answer:

Which request started this workflow?
Which downstream calls did it make?
Which job run processed it?
Where did it stall, retry, or duplicate?

Without a correlation ID, your logs are a pile of facts with no edges. You see 500s and timeouts, but you cannot group them into one narrative. In legacy systems, this is why "restart fixed it" becomes the dominant incident response.

Correlation IDs are not about pretty dashboards. They are about making the next incident smaller.

How to diagnose: can you reconstruct one request path from logs?

Pick one incident and ask: can we reconstruct the request path?

If you cannot list the inbound request, the dependency calls, and the job run that handled it, you do not have end-to-end correlation.

Check whether you have multiple competing IDs.

Common failure:

a reverse proxy adds X-Request-Id
an API gateway adds X-Correlation-Id
your app generates its own RequestId

Now every team is "correlating" but nothing joins.

Check boundary crossings.

Correlation often exists inside one process (scoped logging), but dies at the boundaries:

HttpClient calls do not copy headers
queue messages do not carry correlation properties
background jobs generate new IDs with no parent

If correlation dies at the boundary, your incident narrative will die there too.

Fix: standardize one correlation ID contract and propagate through boundaries

The goal is consistency, not perfection.

1) Define the contract

Pick a single header name and use it everywhere.

Header: X-Correlation-Id
Format: opaque string (GUID is fine; ULID is better for sortability)
Rule: if inbound request has it, keep it. If not, generate it.

Then make the rule explicit: this header is copied to every dependency call and included on every job message.

2) Add inbound middleware (or an equivalent entry hook)

Every request needs to end up with:

a correlation ID
a logging scope that includes it
a response header so callers can report it

In ASP.NET Core, middleware is clean. In classic ASP.NET, this is usually an HttpModule or early pipeline hook.

The important part is not the framework. The important part is that the ID is created once and becomes part of the request context.

3) Propagate through HttpClient

HttpClient does not automatically copy your correlation ID. If you do not add a handler, your downstream logs will be unjoinable.

The safe approach:

set the header on outbound requests
do not overwrite an existing header (respect caller provided IDs)
log dependency calls with correlation ID and route

4) Propagate through queues and jobs

Jobs are where correlation breaks most often.

attach correlation ID to message headers/properties
include it in the job payload if headers are not supported
set a logging scope at job start
log progress events with the same ID

If jobs do not log progress with correlation, you will still end up with "job stuck" incidents that cannot be explained.

What to log: turn scattered events into one incident narrative

Log fields are the join keys for incident narratives. At minimum:

correlation_id
operation (normalized route or job name)
component (api, worker, scheduler)
dependency (host/vendor)
duration_ms
outcome (success, timeout, 429, exception, cancelled)
attempt (if retries exist)

Example request start log:

json

{
  "event": "request.start",
  "operation": "POST /orders",
  "correlation_id": "01H...",
  "component": "api",
  "remote_ip": "..."
}

Example dependency log:

json

{
  "event": "dependency.call",
  "dependency": "vendor-x",
  "operation": "GET /v2/orders",
  "correlation_id": "01H...",
  "duration_ms": 842,
  "outcome": "429"
}

Example job start log:

json

{
  "event": "job.start",
  "job": "nightly-export",
  "correlation_id": "01H...",
  "component": "worker"
}

Shipped asset

Download

Free

Correlation ID package (HTTP + jobs)

A clear correlation ID contract plus copy/paste middleware and HttpClient handler so correlation survives boundaries.

Get the package

When to use this (fit check)

You have incidents where logs exist but can’t be joined into one request/job narrative.
Correlation dies at boundaries (HttpClient calls, queues, job runners).
You want a standard contract + copy/paste propagation code for HTTP and jobs.

When NOT to use this (yet)

You’re still debating multiple ID names across teams (pick one first, then automate propagation).
You can’t add the ID to logs at request start and job start (do that before you optimize anything else).
You want distributed tracing only (use this alongside tracing; it’s not a replacement for spans).

What you get (4 files)

correlation-id-contract.md
AspNetCoreCorrelationIdMiddleware.cs
CorrelationIdDelegatingHandler.cs
README.md

How to use

On call: add correlation ID to the incident report and use it to pull the full request path.
Tech lead: standardize the contract and propagate it through HttpClient and job boundaries.
CTO: reduce repeated firefighting by making every incident diagnosable.

Resources

Internal:

.NET Production Rescue
Axiom waitlist
Contact
Why your background jobs hang forever (and no one notices)
Polly retries making outages worse: how retry storms multiply failures in .NET - when correlation reveals retry storms
Thread pool starvation: the silent killer of ASP.NET performance - isolate hangs with correlation
Timeouts first: why infinite waits create recurring outages in .NET - trace timeout cascades
HttpClient keeps getting 429s: why retries amplify rate limiting in .NET - correlate rate limit chains

External:

Troubleshooting Questions Engineers Search

HttpClient does not automatically copy custom headers. If you do not add a delegating handler that explicitly sets the correlation ID header on outbound requests, the ID dies at the boundary. Use a DelegatingHandler to propagate the header on every call.

Because jobs run outside the HTTP request scope, and most queue systems do not automatically copy message metadata. You need to explicitly attach the correlation ID to the message payload or headers, then create a logging scope at job start. Without this, job logs become isolated facts.

Activity.Current.Id is part of distributed tracing and follows W3C trace context. It works well for spans and telemetry, but breaks across queue boundaries and job runners. Generate an explicit correlation ID for application-level tracing, and log both if you have distributed tracing infrastructure.

Use an HttpModule or early pipeline hook to extract or generate the correlation ID. Store it in HttpContext.Items so handlers and downstream code can access it. Propagate through HttpClient using a custom message handler. The contract is the same, only the entry point differs.

Pick one as the standard and propagate it everywhere. The failure mode is keeping multiple competing IDs—your proxy sends X-Request-Id, your gateway sends X-Correlation-Id, and your app generates its own. Standardize the join key, then log alternative IDs separately if you need them for vendor support.

Because HTTP requests have built-in scopes (HttpContext), but jobs do not. Jobs cross more boundaries (queue serialization, worker process startup, separate logging contexts) and need explicit propagation at each stage. Add correlation to message metadata, extract at job start, and set a logging scope immediately.

Yes. If a customer reports duplicate charges or orders, search logs by correlation ID to see if the same ID appears twice (duplicate processing) or if two different IDs exist (separate requests). This reveals whether the issue is idempotency failure, retry logic, or user action.

FAQ

Not exactly. A correlation ID is an application level join key that you control. A trace ID is part of distributed tracing and usually comes from W3C trace context (traceparent). In practice you can log both: keep an explicit correlation_id for humans and incident reports, and log trace_id when you have tracing.

Pick one as the contract and propagate it. If your proxy ID is stable and available everywhere, you can adopt it. The failure mode is keeping both and letting services pick different ones. Standardize the join key, then log the proxy request ID separately if you still want it.

No. Correlation IDs are useful even with plain logs. OpenTelemetry helps when you want spans, timing, and sampling across services. The correlation ID contract is still valuable because it works across jobs and message boundaries where traces are often incomplete.

Because jobs cross more boundaries and run outside a request scope. A request scope can carry context automatically, but a job needs explicit propagation through message metadata and explicit logging scope creation at job start. If you do not do that, job logs become isolated facts.

Pick one API and one worker. Add inbound correlation, outbound HttpClient propagation, and job start scopes. Then validate that one request produces a connected chain of logs across components. Once that works, roll it out to the rest of the estate.

Checklist (copy/paste)

One correlation ID contract exists: header name, format, and propagation rules.
Inbound requests extract-or-generate the ID and return it in the response header.
A logging scope includes correlation_id for the full request lifecycle.
HttpClient propagation exists (a DelegatingHandler copies the header to outbound calls).
Dependency logs include correlation_id, dependency, operation/route, duration, outcome.
Queue/job messages carry the ID (headers/properties preferred; payload fallback).
Workers create a logging scope at job start and keep the same ID for progress logs.
Competing IDs (proxy/gateway IDs) are logged as secondary fields, not as the join key.
One “golden path” trace is tested: request → downstream call → queue message → job run, all joinable by one ID.

Coming soon

Axiom is where we keep consistent operational defaults for production systems: logging schemas, incident runbooks, and safe reliability patterns you can reuse.

Coming soon

Axiom (Coming Soon)

Get notified when we ship real operational assets (runbooks, templates, schemas), not generic tutorials.

Join waitlist

Key takeaways

Correlation IDs are a join key for incident narratives.
Pick one contract, propagate it through boundaries, and log it consistently.
The value is measured in faster incident resolution and fewer repeat incidents, not in prettier dashboards.

Recommended resources

Download the shipped checklist/templates for this post.

Correlation IDs package (HTTP + jobs)Free

A correlation ID contract plus copy/paste ASP.NET Core middleware and an HttpClient handler so correlation survives HTTP and background job boundaries and incidents become one-query diagnosable.

resource

.NETFeb 04, 2026

Structured logging that actually helps: Serilog fields that matter in .NET incidents

When logs are noisy but useless: why incidents stay unsolved, which fields actually explain failures, and the minimal schema that makes .NET outages diagnosable.

.NETFeb 04, 2026

OpenTelemetry for .NET: minimum viable tracing for production debugging

When incidents span multiple services and logs cannot explain latency: the smallest OpenTelemetry setup that makes production debugging possible without a full rewrite.

.NETJan 30, 2026

HttpClient keeps getting 429s: why retries amplify rate limiting in .NET

When retries multiply 429 errors instead of fixing them: how retry amplification happens, how to prove it, and how to honor Retry-After with budgets.

Fast triage table (what to check first)

Why incidents repeat: logs exist but cannot be joined across services

How to diagnose: can you reconstruct one request path from logs?

Fix: standardize one correlation ID contract and propagate through boundaries

1) Define the contract

2) Add inbound middleware (or an equivalent entry hook)

3) Propagate through HttpClient

4) Propagate through queues and jobs

What to log: turn scattered events into one incident narrative

Shipped asset

Correlation ID package (HTTP + jobs)

What you get (4 files)

How to use

Resources

Troubleshooting Questions Engineers Search

FAQ

Checklist (copy/paste)

Coming soon

Axiom (Coming Soon)

Key takeaways

Recommended resources

Related posts

Structured logging that actually helps: Serilog fields that matter in .NET incidents

OpenTelemetry for .NET: minimum viable tracing for production debugging

HttpClient keeps getting 429s: why retries amplify rate limiting in .NET