# 429 throttling runbook (Retry-After)

Use this when you see sustained 429s and rising latency/timeouts.

## 1) Confirm the source

- Identify which dependency is returning 429.
- Confirm which routes/operations are throttled (not just status codes).
- Confirm whether Retry-After is present and parseable.

## 2) Contain the blast radius (first 15 minutes)

- Cap concurrency for the throttled dependency.
- Shed low-priority calls (non-critical features, background enrichment, bulk syncs).
- Reduce retry attempts immediately (especially for non-idempotent operations).

If you cannot guarantee idempotency, do not retry. Fail fast and surface an actionable error.

## 3) Stop retry amplification

Look for these patterns:

- retry policies that treat all non-2xx as retryable
- retry loops that ignore Retry-After
- retries without per-attempt timeouts
- no total budget (calls can wait + retry for minutes)

Fix order:

1) Add per-attempt timeout
2) Add total budget cap
3) Honor Retry-After inside the budget
4) Add jitter to prevent synchronized retries

## 4) Prove you are honoring backpressure

You should be able to answer in one query:

- For 429 responses, what was the Retry-After value?
- What delay did we actually apply?
- Did our request rate to the dependency drop when 429 rose?

If you cannot answer those, add the logging fields in `retry-after-logging-fields.md`.

## 5) Recovery

- Validate upstream call rate drops.
- Validate your own thread pool/worker saturation improves.
- Validate queues stop growing.

If throttling is sustained for hours, you need a product-level decision:

- degrade features
- queue work (with dedupe)
- negotiate higher limits

Do not leave the system in a permanent retry-wait state.
