
Jan 30, 20267 min read
Category:.NET
Handling 429s and Retry-After correctly in HttpClient
A production playbook for honoring Retry-After and stopping retry amplification when a dependency throttles your .NET service.
Download available. Jump to the shipped asset.
The incident pattern is familiar: a vendor starts throttling, you get a burst of 429s, and then your own service becomes unstable. Latency spikes, queues grow, and on-call starts scaling out a system that is not down. It is being told to slow down.
The cost is not the 429 itself. The cost is what happens when you treat 429 like a transient error and your retry code ignores backpressure. You multiply load against the throttled dependency, you burn your thread pool on waiting, and you create a retry backlog that hits the vendor the moment it recovers.
This post is the production playbook for .NET: how to treat 429 as backpressure, honor Retry-After correctly, and make the behavior provable in logs.
Rescuing an .NET service in production? Start at the .NET Production Rescue hub and the .NET category.
- Treat 429 as backpressure, not a transient failure.
- Honor Retry-After (seconds and date forms) and cap your total budget.
- Log retry decisions (reason, wait, endpoint, correlation ID) so you can prove the policy is working.
The mechanism: 429 is a request for less concurrency
A 429 is not the same as a 503 or a timeout. It is an explicit statement: the upstream is protecting itself and you are above your allowed rate or concurrency.
If you immediately retry a 429, you have not improved your odds of success. You have increased upstream pressure at the exact moment the upstream asked for less. Under load, that turns into an amplifier.
The failure is usually not one request. It is the shape:
- 429 rate rises
- retries rise faster than original traffic
- latency rises because calls wait on backoff (or worse, retry immediately)
- your own queues fill because threads are tied up waiting and retrying
Teams get stuck because the dashboards look like a vendor outage, so they scale out. Scaling out increases concurrency and makes the throttling worse.
Diagnosis ladder (fast checks first)
Start with the shortest path to truth. The goal is to prove whether throttling is real and whether your client is respecting it.
- Confirm the response is actually 429 from the dependency you think.
- Log the upstream host, route, and status code.
- If you have multiple dependencies behind one HttpClient, separate them. A single misbehaving vendor can poison unrelated calls.
- Check whether Retry-After is present and what form it is in.
Retry-After can be:
- a delta in seconds, like
Retry-After: 10 - an HTTP date, like
Retry-After: Wed, 21 Oct 2015 07:28:00 GMT
If you only support one form, you are only sometimes honoring backpressure.
- Measure amplification.
If your request rate to the vendor is 1,000 rpm and your retry attempts add 2,000 rpm, you are not throttled. You are self attacking.
- Look for the common foot-guns.
- retry policies that treat any non-2xx as retryable
- retry loops that ignore Retry-After
- retries without a per-attempt timeout (stacked waits)
- no total budget (one call can burn a worker for minutes)
If you see those, you have a policy problem. Not a vendor problem.
Fix plan: respect backpressure without increasing blast radius
The safe goal is not "never see a 429". The goal is to fail fast when you must, retry when it is safe, and slow down when the upstream asks.
1) Gate 429 retries behind idempotency and a total budget
A 429 retry is only safe if a duplicate attempt does not create a duplicate side effect.
- Safe: GETs, idempotent POSTs with idempotency keys, retries that hit a cache read
- Unsafe: payment capture, shipment creation, "send email" endpoints without dedupe
Also, treat timeouts and retries as one policy. A retry policy without a budget is a slow leak that becomes a queue pileup.
2) Honor Retry-After, but cap it
Retry-After is advice. In production you still need boundaries.
- honor it when it is within your total time budget
- cap extremely large waits (for example, if the upstream says 600 seconds, you may want to stop and surface an actionable failure)
- add jitter when you have many callers so you do not synchronize
3) Reduce concurrency, not just delay
Delaying an individual request helps, but if you have 200 in flight callers, you can still overload the upstream.
If throttling is sustained:
- cap concurrency for that dependency (bulkhead)
- shed low priority calls
- cache where safe
- degrade features instead of stacking retries
That is the difference between stabilization and "we waited longer before we failed".
Minimal code: parse Retry-After correctly (seconds and HTTP date)
The goal is not a perfect policy framework. The goal is to stop doing the wrong thing by default.
// Works in .NET Framework and modern .NET.
// Use as a DelegatingHandler or inside your retry policy.
static bool TryGetRetryAfterDelay(HttpResponseMessage response, DateTimeOffset now, out TimeSpan delay)
{
delay = TimeSpan.Zero;
// Prefer typed header parsing when available.
var ra = response.Headers.RetryAfter;
if (ra == null) return false;
if (ra.Delta.HasValue)
{
delay = ra.Delta.Value;
return delay > TimeSpan.Zero;
}
if (ra.Date.HasValue)
{
var target = ra.Date.Value;
delay = target > now ? (target - now) : TimeSpan.Zero;
return delay > TimeSpan.Zero;
}
return false;
}This should be paired with:
- per-attempt timeout
- attempt cap
- total budget cap
- logging of the decision
Do not ship this alone and call it fixed.
What to log so throttling becomes provable
You need enough fields to answer one question in one query: "Did we honor backpressure or did we amplify it?"
Log at the retry decision point:
dependency: vendor name or hostroute: normalized route or operation namestatus: 429retry_after_ms: parsed value (or null)retry_delay_ms: what you actually waited (post-cap, post-jitter)attempt: attempt numbertotal_elapsed_ms: time spent so farbudget_ms: max alloweddecision:delay-and-retry|fail-fast|degrade|escalatecorrelation_id: request correlation id
Example log line:
{
"event": "http.retry.decision",
"dependency": "vendor-x",
"route": "GET /v2/orders",
"status": 429,
"retry_after_ms": 10000,
"retry_delay_ms": 11234,
"attempt": 2,
"total_elapsed_ms": 15300,
"budget_ms": 20000,
"decision": "delay-and-retry",
"correlation_id": "01H..."
}If you cannot answer that question, the next throttling incident will look like mystery latency.
Shipped asset
HttpClient 429 + Retry-After package
Copy/paste-ready handler, runbook, and logging fields to stop retry amplification when a dependency is throttling. Safe for legacy .NET systems.
Use this when a vendor starts returning 429 and your retry logic is making latency and queueing worse.
What you get (4 files)
RetryAfterDelegatingHandler.cs429-retry-after-runbook.mdretry-after-logging-fields.mdREADME.md
How to use
- On call: use the runbook to confirm throttling source and contain blast radius.
- Tech lead: standardize the handler + budgets across services that call the dependency.
- CTO: use the logging fields to make throttling measurable and reduce repeat incidents.
Resources
Internal:
- .NET Production Rescue
- Axiom waitlist
- Contact
- The real cost of retry logic: when “resilience” makes outages worse
External:
- Retry-After header (MDN)
- HTTP 429 Too Many Requests (MDN)
- HttpResponseHeaders.RetryAfter (Microsoft)
FAQ
No. A 429 is a request to slow down, not a promise that retrying will work. Retry only when the operation is safe to repeat (idempotent) and only within a budget that protects your own thread pool. If the upstream is throttling for minutes, the right move is usually to reduce concurrency and degrade features, not to keep retrying.
Treat that as a weak signal, not permission to hammer. Use a small bounded backoff with jitter, cap attempts, and log that Retry-After was missing. If throttling is sustained, you still need a concurrency cap and a fail-fast path that produces an actionable error for the caller.
Add jitter to the delay you honor, and cap concurrency per dependency. Without jitter, 100 instances that all see Retry-After: 10 will all retry at the same moment. Without a bulkhead, you will still have too much in flight work even if each call is delayed.
Polly can express the policy, but it does not make the policy safe by default. The safety comes from classification (what is retryable), budgets (per-attempt and total), and observability (logging decisions). Many incidents happen because a policy exists but nobody can prove what it did under load.
Scaling out increases concurrency. If the upstream is throttling, more concurrency creates more 429s and more retries, which creates more backlog and more waiting. In these incidents, the fix is often the opposite: cap concurrency, shed low priority calls, and keep the rest within a budget.
Coming soon
If this incident pattern feels familiar, the fastest win is a consistent set of defaults across services: budgets, backpressure handling, and logging fields. Axiom is where these packages live so you do not have to re-derive them during an incident.
Axiom (Coming Soon)
Get notified when we ship real operational assets (runbooks, templates, schemas), not generic tutorials.
Key takeaways
- 429 is backpressure. Treat it as a request for less concurrency.
- Honor Retry-After correctly, but keep boundaries (attempt caps and total budget caps).
- If you cannot prove behavior in logs, you will relive the incident with better dashboards but the same broken defaults.
Related posts

The real cost of retry logic: when “resilience” makes outages worse
Retry storms don’t look like a bug — they look like good engineering until production melts. Here’s how to bound retries with stop rules and proof.

Correlation IDs in .NET: trace one request across services and jobs
A production playbook for a single correlation ID contract in .NET so requests and jobs can be traced end-to-end across boundaries.

Timeouts first: why infinite waits create recurring outages in .NET
Infinite waits do not look like crashes. They look like calm dashboards and growing backlog. This is the production playbook for adding time budgets safely in .NET.