Jan 30, 202611 min read

Share |

Category:.NET

HttpClient keeps getting 429s: why retries amplify rate limiting in .NET

When retries multiply 429 errors instead of fixing them: how retry amplification happens, how to prove it, and how to honor Retry-After with budgets.

Free download: HttpClient 429 + Retry-After package. Jump to the download section.

Paid pack available. Jump to the Axiom pack.

The incident pattern is familiar: a vendor starts throttling, you get a burst of 429s, and then your own service becomes unstable. Latency spikes, queues grow, and on-call starts scaling out a system that is not down. It is being told to slow down.

The cost is not the 429 itself. The cost is what happens when you treat 429 like a transient error and your retry code ignores backpressure. You multiply load against the throttled dependency, you burn your thread pool on waiting, and you create a retry backlog that hits the vendor the moment it recovers.

This post is the production playbook for .NET: how to treat 429 as backpressure, honor Retry-After correctly, and make the behavior provable in logs.

Rescuing an .NET service in production? Start at the .NET Production Rescue hub and the .NET category.

If you only do three things

Treat 429 as backpressure, not a transient failure.
Honor Retry-After (seconds and date forms) and cap your total budget.
Log retry decisions (reason, wait, endpoint, correlation ID) so you can prove the policy is working.

Fast triage table (what to check first)

Symptom	Likely cause	Confirm fast	First safe move
429 rate spikes right after deploy/scale-out	Concurrency increased; client ignores backpressure	Instance count rose; in-flight to vendor rose; 429s rose immediately	Cap concurrency (bulkhead) per dependency; shed low priority calls
429s persist even “with retries”	Retry-After not honored (or only one format)	Logs show header missing/unparsed; wait times don’t match upstream signal	Parse both `Retry-After: <seconds>` and HTTP-date; log parsed value
Vendor recovers, then fails again	Synchronized retries (no jitter) + retry backlog	Retry attempts cluster by second across instances	Add jitter; cap total retries and total budget
Your service becomes unstable while vendor is throttling	Retry backlog captures threads/sockets	In-flight rises while completions flatten; timeouts start elsewhere	Tighten budgets; stop retrying when budget is spent; bulkhead the dependency
429 retries happen on non-idempotent operations	Unsafe duplication risk	Same POST/PUT repeats; downstream side effects duplicate	Fail fast unless idempotent; add idempotency keys first

Why 429s multiply when you retry: backpressure vs transient errors

A 429 is not the same as a 503 or a timeout. It is an explicit statement: the upstream is protecting itself and you are above your allowed rate or concurrency.

If you immediately retry a 429, you have not improved your odds of success. You have increased upstream pressure at the exact moment the upstream asked for less. Under load, that turns into an amplifier.

The failure is usually not one request. It is the shape:

429 rate rises
retries rise faster than original traffic
latency rises because calls wait on backoff (or worse, retry immediately)
your own queues fill because threads are tied up waiting and retrying

Teams get stuck because the dashboards look like a vendor outage, so they scale out. Scaling out increases concurrency and makes the throttling worse.

How to prove retries are amplifying 429s: diagnosis checklist

Start with the shortest path to truth. The goal is to prove whether throttling is real and whether your client is respecting it.

Confirm the response is actually 429 from the dependency you think.

Log the upstream host, route, and status code.
If you have multiple dependencies behind one HttpClient, separate them. A single misbehaving vendor can poison unrelated calls.

Check whether Retry-After is present and what form it is in.

Retry-After can be:

a delta in seconds, like Retry-After: 10
an HTTP date, like Retry-After: Wed, 21 Oct 2015 07:28:00 GMT

If you only support one form, you are only sometimes honoring backpressure.

Measure amplification.

If your request rate to the vendor is 1,000 rpm and your retry attempts add 2,000 rpm, you are not throttled. You are self attacking.

Look for the common foot-guns.

retry policies that treat any non-2xx as retryable
retry loops that ignore Retry-After
retries without a per-attempt timeout (stacked waits)
no total budget (one call can burn a worker for minutes)

If you see those, you have a policy problem. Not a vendor problem. If threads are tied up waiting on 429 retries and you see requests timing out with normal CPU, you have thread pool starvation from retry backlog.

How to stop retry amplification: honor Retry-After with budgets

The safe goal is not "never see a 429". The goal is to fail fast when you must, retry when it is safe, and slow down when the upstream asks.

1) Gate 429 retries behind idempotency and a total budget

A 429 retry is only safe if a duplicate attempt does not create a duplicate side effect.

Safe: GETs, idempotent POSTs with idempotency keys, retries that hit a cache read
Unsafe: payment capture, shipment creation, "send email" endpoints without dedupe

Also, treat timeouts and retries as one policy. A retry policy without a budget is a slow leak that becomes a queue pileup. See retry logic anti-patterns for why retrying without classification causes cascading failures.

2) Honor Retry-After, but cap it

Retry-After is advice. In production you still need boundaries.

honor it when it is within your total time budget
cap extremely large waits (for example, if the upstream says 600 seconds, you may want to stop and surface an actionable failure)
add jitter when you have many callers so you do not synchronize

3) Reduce concurrency, not just delay

Delaying an individual request helps, but if you have 200 in flight callers, you can still overload the upstream.

If throttling is sustained:

cap concurrency for that dependency (bulkhead)
shed low priority calls
cache where safe
degrade features instead of stacking retries

That is the difference between stabilization and "we waited longer before we failed". Learn more about why retries amplify outages when they lack backoff and jitter.

Parse Retry-After correctly: handle seconds and HTTP date formats

The goal is not a perfect policy framework. The goal is to stop doing the wrong thing by default.

csharp

// Works in .NET Framework and modern .NET.
// Use as a DelegatingHandler or inside your retry policy.
 
static bool TryGetRetryAfterDelay(HttpResponseMessage response, DateTimeOffset now, out TimeSpan delay)
{
    delay = TimeSpan.Zero;
 
    // Prefer typed header parsing when available.
    var ra = response.Headers.RetryAfter;
    if (ra == null) return false;
 
    if (ra.Delta.HasValue)
    {
        delay = ra.Delta.Value;
        return delay > TimeSpan.Zero;
    }
 
    if (ra.Date.HasValue)
    {
        var target = ra.Date.Value;
        delay = target > now ? (target - now) : TimeSpan.Zero;
        return delay > TimeSpan.Zero;
    }
 
    return false;
}

This should be paired with:

per-attempt timeout
attempt cap
total budget cap
logging of the decision

Do not ship this alone and call it fixed.

What to log so throttling becomes provable

You need enough fields to answer one question in one query: "Did we honor backpressure or did we amplify it?"

Log at the retry decision point:

dependency: vendor name or host
route: normalized route or operation name
status: 429
retry_after_ms: parsed value (or null)
retry_delay_ms: what you actually waited (post-cap, post-jitter)
attempt: attempt number
total_elapsed_ms: time spent so far
budget_ms: max allowed
decision: delay-and-retry | fail-fast | degrade | escalate
correlation_id: request correlation id

Example log line:

json

{
  "event": "http.retry.decision",
  "dependency": "vendor-x",
  "route": "GET /v2/orders",
  "status": 429,
  "retry_after_ms": 10000,
  "retry_delay_ms": 11234,
  "attempt": 2,
  "total_elapsed_ms": 15300,
  "budget_ms": 20000,
  "decision": "delay-and-retry",
  "correlation_id": "01H..."
}

If you cannot answer that question, the next throttling incident will look like mystery latency. Use correlation IDs to trace which original request spawned which 429 retries.

Shipped asset

Download

Free

HttpClient 429 + Retry-After package

Copy/paste-ready handler, runbook, and logging fields to stop retry amplification when a dependency is throttling. Safe for legacy .NET systems.

Get the package

When to use this (fit check)

You’re seeing repeated 429s and your service gets unstable during vendor throttling.
You need correct Retry-After handling (seconds + HTTP-date), plus caps and a total budget.
You want to prove behavior in logs (parsed header, chosen delay, budget, decision).

When NOT to use this (yet)

You’re retrying non-idempotent writes without idempotency keys (add idempotency first).
You don’t have per-attempt timeouts and a total budget (timeouts-first first; then 429 handling).
You can’t cap concurrency to the dependency (a delay-only approach will still backlog under load).

What you get (4 files)

RetryAfterDelegatingHandler.cs
429-retry-after-runbook.md
retry-after-logging-fields.md
README.md

How to use

On call: use the runbook to confirm throttling source and contain blast radius.
Tech lead: standardize the handler + budgets across services that call the dependency.
CTO: use the logging fields to make throttling measurable and reduce repeat incidents.

Axiom Pack

$49

Retry Policy Kit: Battle-Tested Resilience for Production

Managing retries across multiple services? Get pre-configured Polly policies with monitoring integration, circuit breaker patterns, and incident runbooks. Stop debugging retry storms in production.

✓10+ production-grade Polly policies for HTTP, gRPC, and database calls
✓Circuit breaker + retry coordination patterns
✓Monitoring integration (Prometheus, OpenTelemetry, Application Insights)
✓Incident runbooks for retry storm diagnosis and mitigation

Get Retry Policy Kit →

Checklist (copy/paste)

429 is treated as backpressure (not a generic transient failure).
Retry is only allowed for idempotent operations (or operations protected by idempotency keys).
Retry-After parsing supports both delta seconds and HTTP-date formats.
Retry delay is capped and jittered (to avoid synchronized retry waves).
Per-attempt timeouts exist and are smaller than the total budget.
Total budget is enforced (429 handling + retries + waits cannot burn a worker indefinitely).
Concurrency to the dependency is capped (bulkhead), especially during sustained throttling.
Logs capture: dependency/route, attempt, parsed Retry-After, chosen delay, budget, decision, correlation id.
Alerts exist for 429 rate spikes and retry-attempt spikes per dependency.

Resources

Internal:

External:

Troubleshooting Questions Engineers Search

If 429s persist after honoring Retry-After, check concurrency. Many instances retrying at once still overwhelm the upstream even with delays. Add a bulkhead (concurrency cap per dependency) and add jitter so instances don't retry in synchronized waves.

Compare request rate vs retry rate. If original requests = 1000/min but retries add 3000/min, you're amplifying load 4x. Check logs for: retry attempts per endpoint, 429 rate trend, latency spike correlation with retry rate spike.

No. Retry-After tells you WHEN to retry, not IF it's safe. Only retry if the operation is idempotent. Non-idempotent operations (payments, orders, emails) should fail fast or use an idempotency key, not blind retry.

Both. Retry-After can be delta seconds (10) or HTTP date (Wed, 21 Oct 2015 07:28:00 GMT). If you only parse one format, you're only sometimes honoring backpressure. Parse both or you'll amplify load when the vendor switches formats.

Polly can express retry policies, but doesn't parse Retry-After by default. You need a custom DelegatingHandler or Polly policy that reads the header and calculates wait time. Polly provides the retry framework; you provide the backpressure logic.

Scaling out increases total concurrency to the throttled dependency. If you had 5 instances making 100 req/sec (500 total) and scale to 10 instances, you're now making 1000 req/sec. If the upstream throttle is 600 req/sec, more instances = more 429s = more retries = worse amplification.

Cap it. If your total request budget is 30 seconds, waiting 10 minutes isn't viable. Honor the signal (slow down), but fail fast within your budget. Log the uncapped value so you can discuss reasonable limits with the vendor.

Additional Questions

No. A 429 is a request to slow down, not a promise that retrying will work. Retry only when the operation is safe to repeat (idempotent) and only within a budget that protects your own thread pool. If the upstream is throttling for minutes, the right move is usually to reduce concurrency and degrade features, not to keep retrying.

Treat that as a weak signal, not permission to hammer. Use a small bounded backoff with jitter, cap attempts, and log that Retry-After was missing. If throttling is sustained, you still need a concurrency cap and a fail-fast path that produces an actionable error for the caller.

Add jitter to the delay you honor, and cap concurrency per dependency. Without jitter, 100 instances that all see Retry-After: 10 will all retry at the same moment. Without a bulkhead, you will still have too much in flight work even if each call is delayed.

Polly can express the policy, but it does not make the policy safe by default. The safety comes from classification (what is retryable), budgets (per-attempt and total), and observability (logging decisions). Many incidents happen because a policy exists but nobody can prove what it did under load.

Scaling out increases concurrency. If the upstream is throttling, more concurrency creates more 429s and more retries, which creates more backlog and more waiting. In these incidents, the fix is often the opposite: cap concurrency, shed low priority calls, and keep the rest within a budget.

Coming soon

If this incident pattern feels familiar, the fastest win is a consistent set of defaults across services: budgets, backpressure handling, and logging fields. Axiom is where these packages live so you do not have to re-derive them during an incident.

Coming soon

Axiom (Coming Soon)

Get notified when we ship real operational assets (runbooks, templates, schemas), not generic tutorials.

Join waitlist

Key takeaways

429 is backpressure. Treat it as a request for less concurrency.
Honor Retry-After correctly, but keep boundaries (attempt caps and total budget caps).
If you cannot prove behavior in logs, you will relive the incident with better dashboards but the same broken defaults.

Recommended resources

Download the shipped checklist/templates for this post.

HttpClient 429 + Retry-After packageFree

A copy/paste handler that parses Retry-After (seconds and HTTP date) plus a 429 runbook and logging fields so throttling becomes bounded, observable, and non-amplifying in .NET.

resource

.NETJan 26, 2026

Retries making outages worse: when resilience policies multiply failures in .NET

Retry storms don't look like a bug—they look like good engineering until retries amplify failures and multiply in-flight requests during backpressure.

.NETFeb 04, 2026

Idempotency keys for APIs: stop duplicate orders, emails, and writes

When retries create duplicate side effects, idempotency keys are the only safe fix. This playbook shows how to design keys, store results, and prove duplicates cannot recur.

.NETJan 28, 2026

Cannot trace requests across services: why correlation IDs die at boundaries in .NET

A production playbook for when logs exist but cannot be joined—correlation IDs die at HttpClient boundaries, jobs, and queues, making incidents unreproducible.

Fast triage table (what to check first)

Why 429s multiply when you retry: backpressure vs transient errors

How to prove retries are amplifying 429s: diagnosis checklist

How to stop retry amplification: honor Retry-After with budgets

1) Gate 429 retries behind idempotency and a total budget

2) Honor Retry-After, but cap it

3) Reduce concurrency, not just delay

Parse Retry-After correctly: handle seconds and HTTP date formats

What to log so throttling becomes provable

Shipped asset

HttpClient 429 + Retry-After package

What you get (4 files)

How to use

Retry Policy Kit: Battle-Tested Resilience for Production

Checklist (copy/paste)

Resources

Troubleshooting Questions Engineers Search

Additional Questions

Coming soon

Axiom (Coming Soon)

Key takeaways

Recommended resources

Related posts

Retries making outages worse: when resilience policies multiply failures in .NET

Idempotency keys for APIs: stop duplicate orders, emails, and writes

Cannot trace requests across services: why correlation IDs die at boundaries in .NET