# Retry + Backoff Safe Defaults (Trading/Exchange APIs)

This is a conservative baseline for exchange APIs. The goal is to **reduce bans and self-inflicted outages** while still recovering from transient failures.

## 1) Retry policy by failure type

### A. Never retry (fail fast)

- 401/403 auth failures
- signature invalid / timestamp outside window
- permission denied
- invalid parameter / validation errors (4xx that are clearly your fault)

Action:

- open circuit breaker
- alert
- require human investigation

### B. Retry with backoff (transient)

- 408 request timeout
- 429 rate limited
- 5xx server errors
- network errors (DNS, connection reset)

Action:

- retry with exponential backoff + jitter
- cap retries
- respect `Retry-After` if provided

## 2) Recommended defaults

- Max attempts: 5 (including the initial attempt)
- Base delay: 250ms
- Backoff: exponential
- Jitter: full jitter (random 0..delay)
- Max delay: 8s
- Total max wall-clock per request: 20–30s (depending on endpoint)

If the endpoint is order placement/cancel:

- require idempotency key semantics (client-side)
- consider fewer retries (2–3) and stronger timeouts

## 3) Concurrency limits (critical)

Even perfect backoff fails if you have unbounded concurrency.

- Per endpoint concurrency: 2–4
- Per host global concurrency: 8–16
- Websocket reconnects: singleflight (only one reconnect attempt at a time)

## 4) Circuit breaker guidance

Open the breaker when:

- auth errors occur (401/403/signature)
- repeated 429s beyond threshold
- repeated 5xx beyond threshold

Half-open tests:

- 1 request every 10–30 seconds
- close the breaker only after N successes

## 5) Logging fields (minimum)

Log these per attempt:

- `endpoint`
- `method`
- `status`
- `error_code` / `error_message`
- `attempt`
- `delay_ms`
- `retry_reason`
- `request_id` (from response headers if available)
- `idempotency_key` (when relevant)
