
Jan 30, 20269 min read
Category:.NET
HttpClient keeps getting 429s: why retries amplify rate limiting in .NET
When retries multiply 429 errors instead of fixing them: how retry amplification happens, how to prove it, and how to honor Retry-After with budgets.
Download available. Jump to the shipped asset.
Paid pack available. Jump to the Axiom pack.
The incident pattern is familiar: a vendor starts throttling, you get a burst of 429s, and then your own service becomes unstable. Latency spikes, queues grow, and on-call starts scaling out a system that is not down. It is being told to slow down.
The cost is not the 429 itself. The cost is what happens when you treat 429 like a transient error and your retry code ignores backpressure. You multiply load against the throttled dependency, you burn your thread pool on waiting, and you create a retry backlog that hits the vendor the moment it recovers.
This post is the production playbook for .NET: how to treat 429 as backpressure, honor Retry-After correctly, and make the behavior provable in logs.
Rescuing an .NET service in production? Start at the .NET Production Rescue hub and the .NET category.
- Treat 429 as backpressure, not a transient failure.
- Honor Retry-After (seconds and date forms) and cap your total budget.
- Log retry decisions (reason, wait, endpoint, correlation ID) so you can prove the policy is working.
Why 429s multiply when you retry: backpressure vs transient errors
A 429 is not the same as a 503 or a timeout. It is an explicit statement: the upstream is protecting itself and you are above your allowed rate or concurrency.
If you immediately retry a 429, you have not improved your odds of success. You have increased upstream pressure at the exact moment the upstream asked for less. Under load, that turns into an amplifier.
The failure is usually not one request. It is the shape:
- 429 rate rises
- retries rise faster than original traffic
- latency rises because calls wait on backoff (or worse, retry immediately)
- your own queues fill because threads are tied up waiting and retrying
Teams get stuck because the dashboards look like a vendor outage, so they scale out. Scaling out increases concurrency and makes the throttling worse.
How to prove retries are amplifying 429s: diagnosis checklist
Start with the shortest path to truth. The goal is to prove whether throttling is real and whether your client is respecting it.
- Confirm the response is actually 429 from the dependency you think.
- Log the upstream host, route, and status code.
- If you have multiple dependencies behind one HttpClient, separate them. A single misbehaving vendor can poison unrelated calls.
- Check whether Retry-After is present and what form it is in.
Retry-After can be:
- a delta in seconds, like
Retry-After: 10 - an HTTP date, like
Retry-After: Wed, 21 Oct 2015 07:28:00 GMT
If you only support one form, you are only sometimes honoring backpressure.
- Measure amplification.
If your request rate to the vendor is 1,000 rpm and your retry attempts add 2,000 rpm, you are not throttled. You are self attacking.
- Look for the common foot-guns.
- retry policies that treat any non-2xx as retryable
- retry loops that ignore Retry-After
- retries without a per-attempt timeout (stacked waits)
- no total budget (one call can burn a worker for minutes)
If you see those, you have a policy problem. Not a vendor problem. If threads are tied up waiting on 429 retries and you see requests timing out with normal CPU, you have thread pool starvation from retry backlog.
How to stop retry amplification: honor Retry-After with budgets
The safe goal is not "never see a 429". The goal is to fail fast when you must, retry when it is safe, and slow down when the upstream asks.
1) Gate 429 retries behind idempotency and a total budget
A 429 retry is only safe if a duplicate attempt does not create a duplicate side effect.
- Safe: GETs, idempotent POSTs with idempotency keys, retries that hit a cache read
- Unsafe: payment capture, shipment creation, "send email" endpoints without dedupe
Also, treat timeouts and retries as one policy. A retry policy without a budget is a slow leak that becomes a queue pileup. See retry logic anti-patterns for why retrying without classification causes cascading failures.
2) Honor Retry-After, but cap it
Retry-After is advice. In production you still need boundaries.
- honor it when it is within your total time budget
- cap extremely large waits (for example, if the upstream says 600 seconds, you may want to stop and surface an actionable failure)
- add jitter when you have many callers so you do not synchronize
3) Reduce concurrency, not just delay
Delaying an individual request helps, but if you have 200 in flight callers, you can still overload the upstream.
If throttling is sustained:
- cap concurrency for that dependency (bulkhead)
- shed low priority calls
- cache where safe
- degrade features instead of stacking retries
That is the difference between stabilization and "we waited longer before we failed". Learn more about why retries amplify outages when they lack backoff and jitter.
Parse Retry-After correctly: handle seconds and HTTP date formats
The goal is not a perfect policy framework. The goal is to stop doing the wrong thing by default.
// Works in .NET Framework and modern .NET.
// Use as a DelegatingHandler or inside your retry policy.
static bool TryGetRetryAfterDelay(HttpResponseMessage response, DateTimeOffset now, out TimeSpan delay)
{
delay = TimeSpan.Zero;
// Prefer typed header parsing when available.
var ra = response.Headers.RetryAfter;
if (ra == null) return false;
if (ra.Delta.HasValue)
{
delay = ra.Delta.Value;
return delay > TimeSpan.Zero;
}
if (ra.Date.HasValue)
{
var target = ra.Date.Value;
delay = target > now ? (target - now) : TimeSpan.Zero;
return delay > TimeSpan.Zero;
}
return false;
}This should be paired with:
- per-attempt timeout
- attempt cap
- total budget cap
- logging of the decision
Do not ship this alone and call it fixed.
What to log so throttling becomes provable
You need enough fields to answer one question in one query: "Did we honor backpressure or did we amplify it?"
Log at the retry decision point:
dependency: vendor name or hostroute: normalized route or operation namestatus: 429retry_after_ms: parsed value (or null)retry_delay_ms: what you actually waited (post-cap, post-jitter)attempt: attempt numbertotal_elapsed_ms: time spent so farbudget_ms: max alloweddecision:delay-and-retry|fail-fast|degrade|escalatecorrelation_id: request correlation id
Example log line:
{
"event": "http.retry.decision",
"dependency": "vendor-x",
"route": "GET /v2/orders",
"status": 429,
"retry_after_ms": 10000,
"retry_delay_ms": 11234,
"attempt": 2,
"total_elapsed_ms": 15300,
"budget_ms": 20000,
"decision": "delay-and-retry",
"correlation_id": "01H..."
}If you cannot answer that question, the next throttling incident will look like mystery latency. Use correlation IDs to trace which original request spawned which 429 retries.
Shipped asset
HttpClient 429 + Retry-After package
Copy/paste-ready handler, runbook, and logging fields to stop retry amplification when a dependency is throttling. Safe for legacy .NET systems.
Use this when a vendor starts returning 429 and your retry logic is making latency and queueing worse.
What you get (4 files)
RetryAfterDelegatingHandler.cs429-retry-after-runbook.mdretry-after-logging-fields.mdREADME.md
How to use
- On call: use the runbook to confirm throttling source and contain blast radius.
- Tech lead: standardize the handler + budgets across services that call the dependency.
- CTO: use the logging fields to make throttling measurable and reduce repeat incidents.
Retry Policy Kit: Battle-Tested Resilience for Production
Managing retries across multiple services? Get pre-configured Polly policies with monitoring integration, circuit breaker patterns, and incident runbooks. Stop debugging retry storms in production.
- ✓10+ production-grade Polly policies for HTTP, gRPC, and database calls
- ✓Circuit breaker + retry coordination patterns
- ✓Monitoring integration (Prometheus, OpenTelemetry, Application Insights)
- ✓Incident runbooks for retry storm diagnosis and mitigation
Resources
Internal:
- .NET Production Rescue
- Axiom waitlist
- Contact
- The real cost of retry logic: when “resilience” makes outages worse
External:
- Retry-After header (MDN)
- HTTP 429 Too Many Requests (MDN)
- HttpResponseHeaders.RetryAfter (Microsoft)
Troubleshooting Questions Engineers Search
If 429s persist after honoring Retry-After, check concurrency. Many instances retrying at once still overwhelm the upstream even with delays. Add a bulkhead (concurrency cap per dependency) and add jitter so instances don't retry in synchronized waves.
Compare request rate vs retry rate. If original requests = 1000/min but retries add 3000/min, you're amplifying load 4x. Check logs for: retry attempts per endpoint, 429 rate trend, latency spike correlation with retry rate spike.
No. Retry-After tells you WHEN to retry, not IF it's safe. Only retry if the operation is idempotent. Non-idempotent operations (payments, orders, emails) should fail fast or use an idempotency key, not blind retry.
Both. Retry-After can be delta seconds (10) or HTTP date (Wed, 21 Oct 2015 07:28:00 GMT). If you only parse one format, you're only sometimes honoring backpressure. Parse both or you'll amplify load when the vendor switches formats.
Polly can express retry policies, but doesn't parse Retry-After by default. You need a custom DelegatingHandler or Polly policy that reads the header and calculates wait time. Polly provides the retry framework; you provide the backpressure logic.
Scaling out increases total concurrency to the throttled dependency. If you had 5 instances making 100 req/sec (500 total) and scale to 10 instances, you're now making 1000 req/sec. If the upstream throttle is 600 req/sec, more instances = more 429s = more retries = worse amplification.
Cap it. If your total request budget is 30 seconds, waiting 10 minutes isn't viable. Honor the signal (slow down), but fail fast within your budget. Log the uncapped value so you can discuss reasonable limits with the vendor.
Additional Questions
No. A 429 is a request to slow down, not a promise that retrying will work. Retry only when the operation is safe to repeat (idempotent) and only within a budget that protects your own thread pool. If the upstream is throttling for minutes, the right move is usually to reduce concurrency and degrade features, not to keep retrying.
Treat that as a weak signal, not permission to hammer. Use a small bounded backoff with jitter, cap attempts, and log that Retry-After was missing. If throttling is sustained, you still need a concurrency cap and a fail-fast path that produces an actionable error for the caller.
Add jitter to the delay you honor, and cap concurrency per dependency. Without jitter, 100 instances that all see Retry-After: 10 will all retry at the same moment. Without a bulkhead, you will still have too much in flight work even if each call is delayed.
Polly can express the policy, but it does not make the policy safe by default. The safety comes from classification (what is retryable), budgets (per-attempt and total), and observability (logging decisions). Many incidents happen because a policy exists but nobody can prove what it did under load.
Scaling out increases concurrency. If the upstream is throttling, more concurrency creates more 429s and more retries, which creates more backlog and more waiting. In these incidents, the fix is often the opposite: cap concurrency, shed low priority calls, and keep the rest within a budget.
Coming soon
If this incident pattern feels familiar, the fastest win is a consistent set of defaults across services: budgets, backpressure handling, and logging fields. Axiom is where these packages live so you do not have to re-derive them during an incident.
Axiom (Coming Soon)
Get notified when we ship real operational assets (runbooks, templates, schemas), not generic tutorials.
Key takeaways
- 429 is backpressure. Treat it as a request for less concurrency.
- Honor Retry-After correctly, but keep boundaries (attempt caps and total budget caps).
- If you cannot prove behavior in logs, you will relive the incident with better dashboards but the same broken defaults.
Recommended resources
Download the shipped checklist/templates for this post.
A copy/paste handler that parses Retry-After (seconds and HTTP date) plus a 429 runbook and logging fields so throttling becomes bounded, observable, and non-amplifying in .NET.
resource
Related posts

Retries making outages worse: when resilience policies multiply failures in .NET
Retry storms don't look like a bug—they look like good engineering until retries amplify failures and multiply in-flight requests during backpressure.

Idempotency keys for APIs: stop duplicate orders, emails, and writes
When retries create duplicate side effects, idempotency keys are the only safe fix. This playbook shows how to design keys, store results, and prove duplicates cannot recur.

Cannot trace requests across services: why correlation IDs die at boundaries in .NET
A production playbook for when logs exist but cannot be joined—correlation IDs die at HttpClient boundaries, jobs, and queues, making incidents unreproducible.