
Jan 11, 20268 min read
Category:AutomationCrypto
Why exchange APIs "randomly" ban bots (and how to prevent it)
A production-first playbook to avoid bans: permissions, rate limits, auth hygiene, and traffic patterns that keep trading bots alive.
Download available. Jump to the shipped asset.
An exchange API ban rarely happens out of nowhere.
In production it looks like repeat failures: orders start rejecting, account refresh calls return 429, and then private endpoints flip to 403 after your bot keeps hammering. This post is not a tutorial. It is an incident playbook for keeping trading bots alive by shaping traffic, enforcing stop rules, and logging the minimum signals that prevent guessing.
If you are running bots, treat bans as an automation reliability problem. The fix is not a trick. The fix is predictable client behavior.
The incident pattern (what it looks like)
This is the common sequence.
You deploy a bot change. The bot is fine for hours. Then one dependency degrades, or a websocket drops, or the exchange is under load. Your bot responds with the worst possible behavior: synchronized retries, burst concurrency, and repeated auth failures.
Now the exchange sees a client that looks like abuse. It does not matter that your intent is legitimate. Abuse detection reacts to traffic patterns, not intent.
Operational impact is usually worse than the original error. The first failure might be a short-lived 429. The second-order failure is a block that takes longer to clear and often requires human intervention.
What "banned" usually means
When teams say "the exchange banned us", they typically mean one of these states. The correct response is different for each. If you respond wrong, you can lengthen the block.
- Rate limited (429 or exchange-specific code): you exceeded a budget
- Temporarily blocked: a cooldown window is in effect
- Permission denied (401/403): key scope changed, signing is invalid, or account flags changed
- Abuse detection block: repeated invalid requests, bursts, reconnect storms
- Exchange degraded (5xx/timeouts): platform is failing and your retries amplify it
The most expensive mistake is treating auth or permissions failures like transient rate limiting.
Decision framework (stop, retry, escalate)
Automation reliability starts with one rule: classify the failure before you act.
This mapping is deliberately strict.
- 401/403, signature invalid, timestamp window -> STOP trading, escalate to operator
- 429 -> RETRY with backoff + jitter (bounded), reduce concurrency
- 5xx/timeouts -> RETRY limited, then ESCALATE
- 4xx validation -> STOP and fix inputs
Put it in a table so it becomes policy, not opinion.
| Failure class | Signals | Action | Retry budget |
|---|---|---|---|
| auth/permission | 401/403, signature, timestamp | stop + escalate | 0 |
| rate_limit | 429, Retry-After, weight headers | retry w/ backoff + jitter | 2-3 |
| transient | timeout, 5xx | retry limited then escalate | 1-2 |
| validation | 400, schema error | stop | 0 |
If your bot cannot enforce this, it will argue with the exchange until it gets blocked.
Diagnosis ladder (fast checks first)
This ladder is meant for incidents. It forces you to look at measurable signals instead of guessing.
1) Classify the error
Bucket every failure by class (auth, rate_limit, transient, validation). If you cannot bucket it quickly, you are missing fields in logs.
2) Check the deploy window
Most "random" blocks correlate with a deploy, a config change, or horizontal scaling. Scaling multiplies concurrency instantly, and background tasks become a burst.
3) Check time drift
If you see even occasional timestamp or signature failures, stop retries and fix clock sync. Repeated invalid auth is one of the fastest ways to get classified as abuse.
4) Check endpoint weights and hot paths
Exchanges rarely use a simple requests per second model. Limits often vary by endpoint and weight. A single hot endpoint can melt your budget even if overall request volume looks modest.
5) Check concurrency and retry synchronization
Unbounded concurrency creates bursts. Bursts create synchronized failures. Synchronized failures create synchronized retries.
If you are seeing waves, you have a scheduler problem.
Prevention playbook (make the client boring)
The objective is not to "avoid bans forever". The objective is to make your bot behave like a responsible production client under stress.
1) Enforce per-endpoint concurrency caps
Do not allow any endpoint to run unbounded. Practical starting points:
- private endpoints: concurrency 1-2
- public market data: concurrency 2-4
If you scale out, keep total concurrency stable across instances. Five instances each doing two private calls in parallel is ten concurrent private calls.
2) Centralize traffic in a request scheduler
A scheduler is where you enforce:
- queueing and priorities (orders > portfolio refresh)
- backpressure handling (429)
- concurrency limits
- circuit breakers
Without a scheduler you have independent loops that compete and burst.
3) Treat 429 as backpressure, not a glitch
On 429:
- respect Retry-After when present
- reduce concurrency for the hot bucket
- add jitter so retries do not synchronize
- cap retries to a small integer
If you retry immediately at full concurrency, you are manufacturing a retry storm.
4) Never retry auth failures
Auth failures are stop rules:
- 401/403
- signature invalid
- timestamp window
Stop trading, open a breaker, and page a human.
5) Cache and de-duplicate boring reads
Cache anything that does not need per-tick refresh:
- exchange info and symbol lists
- fee schedules
- account permissions
This reduces baseline traffic and leaves headroom for the moments that matter.
6) Add circuit breakers by failure class
At minimum:
- auth breaker (opens immediately)
- rate limit breaker (opens after repeated 429)
- platform breaker (opens after repeated 5xx/timeouts)
Breakers prevent your bot from arguing with reality.
Tradeoffs (what this costs)
These guardrails will reduce your maximum throughput. That is the point.
You are trading peak speed for predictable behavior under stress. If you need more throughput, buy it with design (websockets, batching, caching, fewer endpoints), not with burst concurrency.
The second tradeoff is visibility. Logging and dashboards feel like overhead until the day you need them. In this lane, lack of visibility is what turns minor backpressure into a long incident.
What to log (minimum viable)
If you cannot answer these questions quickly, you will guess wrong in an incident:
- which endpoint bucket is hot
- whether retries are synchronized
- whether auth failures are repeating
Log per request attempt:
tsbot_instance_idexchangeendpoint(normalized)methodstatuserror_code(exchange-specific)error_message(redacted)request_id(from response headers if available)attemptlatency_msrate_limit_bucket(your name)concurrency_inflightretry_reason(none | 429 | timeout | 5xx | network)delay_ms(when retrying)idempotency_key(for order placement/cancel)
Example event shape:
{
"ts": "2026-01-15T11:22:33.123Z",
"bot_instance_id": "prod-eu-1",
"exchange": "example",
"endpoint": "private/order/create",
"method": "POST",
"status": 429,
"error_code": "rate_limit",
"error_message": "Too many requests",
"request_id": "abc-123",
"attempt": 2,
"latency_ms": 540,
"rate_limit_bucket": "orders",
"concurrency_inflight": 2,
"retry_reason": "429",
"delay_ms": 800,
"idempotency_key": "ord_7f2c..."
}Shipped asset
API ban prevention checklist + backoff defaults
Two files: a key permissions checklist plus retry/backoff safe defaults for exchange clients. Built for production operations.
What you get (2 files):
api-key-permissions-checklist.md: pre-deployment checklist (scopes, kill switch, concurrency caps)retry-backoff-safe-defaults.md: safe retry defaults (429 handling, jitter ranges, breaker triggers)
Quick preview:
429 -> backoff + jitter, reduce concurrency, cap attempts
401/403/signature -> stop and page (never retry)
reconnect storm -> singleflight reconnect, bounded resync
deploy -> treat as suspect, verify throughput and concurrency
Full detail is on the resource page.
Resources
This is intentionally compact. Full package details are on the resource page.
- API ban prevention checklist + backoff defaults
- Crypto Automation hub
- Axiom (Coming Soon)
- Backoff + jitter: the simplest reliability win
External references:
FAQ
429 is backpressure. It means you exceeded a budget for a window or a weighted bucket. A block often shows up as persistent 403/401 or exchange-specific errors that do not clear after you slow down.
Log status, error code, and request id, then test again after waiting. If your bot is retrying 429 at full concurrency, you will not learn anything useful.
Stop retries. Treat it as a stop rule. Investigate key scope, signing, clock drift, and account flags.
Repeated invalid auth requests look like abuse. They can escalate the block.
Deploys often change concurrency, background tasks, and synchronization. Five instances restarting at once can create a burst, and reconnect logic can trigger a resync storm.
Treat the deploy window as suspect. Compare request counts by endpoint bucket before and after.
Usually, yes. Websockets reduce baseline REST traffic, but they introduce reconnect risk.
If you move to websockets, make reconnect singleflight and make resync bounded. Otherwise you trade polling load for reconnect storms.
Small integers. 2-3 attempts with exponential backoff and jitter is usually enough to ride out a transient window.
If the exchange is degraded, retries are not a fix. Use degrade mode and breakers.
Use your own backoff policy and make it visible. Start with a conservative delay, add jitter, and cap attempts.
The key is that delay and attempts must be controlled by policy, not by ad hoc loops in callers.
Coming soon
If this kind of post is useful, the Axiom waitlist is where we ship operational templates (runbooks, decision trees, defaults) that keep bots out of incident mode.
The goal is simple: fewer repeat failures, fewer surprise lockouts, and clearer operator actions.
Axiom (Coming Soon)
Get notified when we ship real operational assets (runbooks, templates, benchmarks), not generic tutorials.
Key takeaways
Bans are usually a symptom of uncontrolled client behavior.
Default assumptions during incidents:
- traffic is bursty
- retries are synchronized
- auth failures are repeating
Fix the system, not the narrative:
- bound concurrency per endpoint
- treat 429 as backpressure with jittered backoff
- never retry auth failures
- add breakers and a kill switch
- log the fields that let you classify fast
For more production-focused work, see the Crypto Automation category.
Related posts

Crypto exchange rate limiting: fixed window vs leaky bucket (stop 429s)
A production-first playbook to stop 429 storms: diagnose the limiter type, add guardrails, and log the signals you need to stop guessing.

Why agents loop forever (and how to stop it)
A production playbook for preventing infinite loops: bounded retries, stop conditions, error classification, and escalation that actually helps humans.

Timestamp drift: the silent cause of signature errors
Why bots suddenly start failing with 401/403 or signature errors, and the production fixes that stop timestamp drift from taking you down.