
Category:AutomationCrypto
API key suddenly forbidden: why exchange APIs ban trading bots without warning
When API key flips from working to 403 forbidden after bot runs for hours: why exchange APIs ban trading bots for traffic bursts, retry storms, and auth failures, and the client behavior that prevents it.
Free download: Exchange API ban prevention checklist + backoff defaults. Jump to the download section.
An exchange API ban rarely happens out of nowhere.
In production it looks like repeat failures: orders start rejecting, account refresh calls return 429, and then private endpoints flip to 403 after your bot keeps hammering. This post is not a tutorial. It is an incident playbook for keeping trading bots alive by shaping traffic, enforcing stop rules, and logging the minimum signals that prevent guessing.
If you are running bots, treat bans as an automation reliability problem. The fix is not a trick. The fix is predictable client behavior.
- Never retry auth failures (401/403/signature/timestamp). Stop trading and escalate.
- Treat 429 as backpressure: backoff + jitter, reduce concurrency, cap attempts.
- Log enough to classify fast (endpoint bucket, attempt, delay, concurrency in-flight, error code).
Fast triage table (what to check first)
| Symptom | Likely cause | Confirm fast | First safe move |
|---|---|---|---|
| Bot worked for hours, then private endpoints flip to 403 | Abuse detection triggered by retry bursts / repeated invalid requests | 429 or timeouts earlier; spikes in attempts/concurrency; many repeated failures | Stop traffic (cooldown), open breaker, then resume with lower concurrency + bounded retries |
| 401/403/signature/timestamp errors repeat | Clock drift or signing bug (non-transient) | Compare sent timestamp vs server time; see timestamp window/signature invalid codes | Stop retries, fix NTP/clock sync and signing, then restart safely |
| 429s spike right after deploy/scale-out | Concurrency increased; schedulers restarted together | Instance count up; in-flight up; 429 spikes immediately | Cap concurrency per endpoint and per instance; add jitter; stagger background jobs |
| Reconnect causes a storm of REST calls | Reconnect + resync logic is unbounded/synchronized | Many reconnect attempts; repeated “full resync” calls | Singleflight reconnect, jittered backoff, bounded resync (only deltas) |
| Rate limits persist even with backoff | Hot endpoint weight bucket is saturated | One endpoint dominates calls; weight headers/limits show that bucket is maxed | Reduce call frequency, cache reads, batch, and shed low priority work |
Why bots get banned after working fine: traffic bursts trigger abuse detection
This is the common sequence.
You deploy a bot change. The bot is fine for hours. Then one dependency degrades, or a websocket drops, or the exchange is under load. Your bot responds with the worst possible behavior: synchronized retries, burst concurrency, and repeated auth failures.
Now the exchange sees a client that looks like abuse. It does not matter that your intent is legitimate. Abuse detection reacts to traffic patterns, not intent.
Operational impact is usually worse than the original error. The first failure might be a short-lived 429. The second-order failure is a block that takes longer to clear and often requires human intervention.
403 forbidden vs 429 rate limited: how to tell real ban from backpressure
When teams say "the exchange banned us", they typically mean one of these states. The correct response is different for each. If you respond wrong, you can lengthen the block.
- Rate limited (429 or exchange-specific code): you exceeded a budget
- Temporarily blocked: a cooldown window is in effect
- Permission denied (401/403): key scope changed, signing is invalid, or account flags changed
- Abuse detection block: repeated invalid requests, bursts, reconnect storms
- Exchange degraded (5xx/timeouts): platform is failing and your retries amplify it
The most expensive mistake is treating auth or permissions failures like transient rate limiting.
Decision framework (stop, retry, escalate)
Automation reliability starts with one rule: classify the failure before you act.
This mapping is deliberately strict.
- 401/403, signature invalid, timestamp window -> STOP trading, escalate to operator
- 429 -> RETRY with backoff + jitter (bounded), reduce concurrency
- 5xx/timeouts -> RETRY limited, then ESCALATE
- 4xx validation -> STOP and fix inputs
Put it in a table so it becomes policy, not opinion.
| Failure class | Signals | Action | Retry budget |
|---|---|---|---|
| auth/permission | 401/403, signature, timestamp | stop + escalate | 0 |
| rate_limit | 429, Retry-After, weight headers | retry w/ backoff + jitter | 2-3 |
| transient | timeout, 5xx | retry limited then escalate | 1-2 |
| validation | 400, schema error | stop | 0 |
If your bot cannot enforce this, it will argue with the exchange until it gets blocked.
How to diagnose API bans: deploy window, retry patterns, and auth failures
This ladder is meant for incidents. It forces you to look at measurable signals instead of guessing.
1) Classify the error
Bucket every failure by class (auth, rate_limit, transient, validation). If you cannot bucket it quickly, you are missing fields in logs.
2) Check the deploy window
Most "random" blocks correlate with a deploy, a config change, or horizontal scaling. Scaling multiplies concurrency instantly, and background tasks become a burst.
3) Check time drift
If you see even occasional timestamp or signature failures, stop retries and fix clock sync. Repeated invalid auth is one of the fastest ways to get classified as abuse.
4) Check endpoint weights and hot paths
Exchanges rarely use a simple requests per second model. Limits often vary by endpoint and weight. A single hot endpoint can melt your budget even if overall request volume looks modest.
5) Check concurrency and retry synchronization
Unbounded concurrency creates bursts. Bursts create synchronized failures. Synchronized failures create synchronized retries.
If you are seeing waves, you have a scheduler problem.
Stop exchange API bans: concurrency caps, retry budgets, and circuit breakers
The objective is not to "avoid bans forever". The objective is to make your bot behave like a responsible production client under stress.
1) Enforce per-endpoint concurrency caps
Do not allow any endpoint to run unbounded. Practical starting points:
- private endpoints: concurrency 1-2
- public market data: concurrency 2-4
If you scale out, keep total concurrency stable across instances. Five instances each doing two private calls in parallel is ten concurrent private calls.
2) Centralize traffic in a request scheduler
A scheduler is where you enforce:
- queueing and priorities (orders > portfolio refresh)
- backpressure handling (429)
- concurrency limits
- circuit breakers
Without a scheduler you have independent loops that compete and burst.
3) Treat 429 as backpressure, not a glitch
On 429:
- respect Retry-After when present
- reduce concurrency for the hot bucket
- add jitter so retries do not synchronize
- cap retries to a small integer
If you retry immediately at full concurrency, you are manufacturing a retry storm.
4) Never retry auth failures
Auth failures are stop rules:
- 401/403
- signature invalid
- timestamp window
Stop trading, open a breaker, and page a human.
5) Cache and de-duplicate boring reads
Cache anything that does not need per-tick refresh:
- exchange info and symbol lists
- fee schedules
- account permissions
This reduces baseline traffic and leaves headroom for the moments that matter.
6) Add circuit breakers by failure class
At minimum:
- auth breaker (opens immediately)
- rate limit breaker (opens after repeated 429)
- platform breaker (opens after repeated 5xx/timeouts)
Breakers prevent your bot from arguing with reality.
Tradeoffs (what this costs)
These guardrails will reduce your maximum throughput. That is the point.
You are trading peak speed for predictable behavior under stress. If you need more throughput, buy it with design (websockets, batching, caching, fewer endpoints), not with burst concurrency.
The second tradeoff is visibility. Logging and dashboards feel like overhead until the day you need them. In this lane, lack of visibility is what turns minor backpressure into a long incident.
What to log (minimum viable)
If you cannot answer these questions quickly, you will guess wrong in an incident:
- which endpoint bucket is hot
- whether retries are synchronized
- whether auth failures are repeating
Log per request attempt:
tsbot_instance_idexchangeendpoint(normalized)methodstatuserror_code(exchange-specific)error_message(redacted)request_id(from response headers if available)attemptlatency_msrate_limit_bucket(your name)concurrency_inflightretry_reason(none | 429 | timeout | 5xx | network)delay_ms(when retrying)idempotency_key(for order placement/cancel)
Example event shape:
{
"ts": "2026-01-15T11:22:33.123Z",
"bot_instance_id": "prod-eu-1",
"exchange": "example",
"endpoint": "private/order/create",
"method": "POST",
"status": 429,
"error_code": "rate_limit",
"error_message": "Too many requests",
"request_id": "abc-123",
"attempt": 2,
"latency_ms": 540,
"rate_limit_bucket": "orders",
"concurrency_inflight": 2,
"retry_reason": "429",
"delay_ms": 800,
"idempotency_key": "ord_7f2c..."
}Shipped asset
API ban prevention checklist + backoff defaults
Two files: a key permissions checklist plus retry/backoff safe defaults for exchange clients. Built for production operations.
- Your bot gets 429s, timeouts, or occasional 5xx and you want bounded, observable retry behavior.
- You’ve had “it worked for hours then got banned” incidents and need stop rules + breaker triggers.
- You run multiple instances and need predictable concurrency caps and backoff defaults.
- You can’t classify errors (auth vs 429 vs transient) from logs in under 2 minutes.
- You place orders without idempotency/deduplication and you currently “retry writes just in case”.
- You don’t have a kill switch / breaker path (add stop mechanisms first, then tune retries).
What you get (2 files):
api-key-permissions-checklist.md: pre-deployment checklist (scopes, kill switch, concurrency caps)retry-backoff-safe-defaults.md: safe retry defaults (429 handling, jitter ranges, breaker triggers)
Quick preview:
429 -> backoff + jitter, reduce concurrency, cap attempts
401/403/signature -> stop and page (never retry)
reconnect storm -> singleflight reconnect, bounded resync
deploy -> treat as suspect, verify throughput and concurrency
Full detail is on the resource page.
Trading Bot Hardening Suite: Production-Ready Crypto Infrastructure
Running production trading bots? Get exchange-specific rate limiters, signature validation, and incident recovery playbooks. Stop losing money to preventable API failures.
- ✓Exchange-specific rate limiting (Binance, Coinbase, Kraken, Bybit)
- ✓Signature validation & timestamp drift detection
- ✓API ban prevention patterns & key rotation strategies
- ✓Incident runbooks for 429s, signature errors, and reconnection storms
Checklist (copy/paste)
- Failures are classified: auth/permission vs rate limit vs transient vs validation.
- 401/403/signature/timestamp failures are STOP rules (0 retries) with an operator escalation.
- 429 is treated as backpressure (honor
Retry-Afterwhen present; otherwise conservative backoff). - Retry has boundaries: max attempts, jittered backoff, and a total time budget.
- Concurrency is capped per endpoint bucket (and across instances after scale-out).
- Traffic is centralized (scheduler/queue) so priorities and caps are enforced in one place.
- Circuit breakers exist by failure class (auth, rate_limit, platform).
- Websocket reconnect is singleflight + jittered backoff; resync is bounded (no full-state storms).
- Logs capture: endpoint bucket, status/error_code, attempt, delay_ms, concurrency_inflight, request_id.
- A kill switch exists to stop trading without redeploy.
Resources
This is intentionally compact. Full package details are on the resource page.
- API ban prevention checklist + backoff defaults
- Crypto Automation hub
- Axiom (Coming Soon)
- Backoff + jitter: the simplest reliability win
- Trading bot keeps getting 429s after deploy: stop rate limit storms - deploy-incident correlation
- The real cost of retry logic: when resilience makes outages worse - retry amplification patterns
External references:
Troubleshooting Questions Engineers Search
Because the exchange's abuse detection system flagged your bot's traffic pattern as suspicious. Common triggers: synchronized retry bursts after 429s, reconnect storms, repeated auth failures, or sudden concurrency spikes after deploy. The key was working—then your bot did something that looked like abuse (even if unintentional). The exchange flipped your key to 403 forbidden as a defense mechanism.
429 is rate limiting—you exceeded a budget for a time window or endpoint weight. It's temporary backpressure. A ban shows up as persistent 403/401 or exchange-specific "account suspended" errors that don't clear after you slow down. To test: stop all traffic for 5-10 minutes, then make a single low-weight request. If it succeeds, you were rate-limited. If it still fails, you're blocked.
Because deploys often change concurrency, restart timing, or background task scheduling. Five instances restarting simultaneously create a burst. Reconnect logic can trigger resync storms. Background tasks that were spread out now run at the same time. The exchange sees a sudden spike in traffic and flags it as suspicious—especially if your bot retries 429s at full concurrency instead of backing off.
Yes. When your bot gets a 429 and retries immediately at full concurrency without backoff or jitter, you create synchronized waves of traffic. The exchange sees sustained high request rates from a single key that won't back off. This looks like an attack or broken client. Abuse detection systems block keys that ignore backpressure signals. The fix: respect Retry-After, add jitter, reduce concurrency on 429.
Most commonly: clock drift. If your server time is off by more than the exchange's timestamp window (often 5-30 seconds), signatures become invalid. Second cause: stale nonce/timestamp reuse if your bot doesn't reset state properly on restart. Third cause: API secret rotation during restart. Check: sync server time with NTP, log timestamp sent vs server time, never reuse nonces.
Small integers: 2-3 attempts with exponential backoff and jitter for transient errors (timeouts, 5xx). For 429 rate limits, even fewer—often 1-2 retries while reducing concurrency. Auth failures (401/403, signature errors) should have 0 retries—stop immediately and escalate. The danger isn't the retry count alone, it's synchronized retries without backoff creating sustained high traffic that looks like abuse.
Yes, especially reconnect storms. If your bot loses connection and multiple instances try to reconnect simultaneously without jitter, or if reconnect logic triggers full market resync on every reconnect, you create burst REST traffic. Make reconnects singleflight (one at a time), add jittered backoff between attempts, and make resync bounded (only fetch what changed, not full state). Unbounded reconnect logic is a common ban trigger after network issues.
FAQ
429 is backpressure. It means you exceeded a budget for a window or a weighted bucket. A block often shows up as persistent 403/401 or exchange-specific errors that do not clear after you slow down.
Log status, error code, and request id, then test again after waiting. If your bot is retrying 429 at full concurrency, you will not learn anything useful.
Stop retries. Treat it as a stop rule. Investigate key scope, signing, clock drift, and account flags.
Repeated invalid auth requests look like abuse. They can escalate the block.
Deploys often change concurrency, background tasks, and synchronization. Five instances restarting at once can create a burst, and reconnect logic can trigger a resync storm.
Treat the deploy window as suspect. Compare request counts by endpoint bucket before and after.
Usually, yes. Websockets reduce baseline REST traffic, but they introduce reconnect risk.
If you move to websockets, make reconnect singleflight and make resync bounded. Otherwise you trade polling load for reconnect storms.
Small integers. 2-3 attempts with exponential backoff and jitter is usually enough to ride out a transient window.
If the exchange is degraded, retries are not a fix. Use degrade mode and breakers.
Use your own backoff policy and make it visible. Start with a conservative delay, add jitter, and cap attempts.
The key is that delay and attempts must be controlled by policy, not by ad hoc loops in callers.
Coming soon
If this kind of post is useful, the Axiom waitlist is where we ship operational templates (runbooks, decision trees, defaults) that keep bots out of incident mode.
The goal is simple: fewer repeat failures, fewer surprise lockouts, and clearer operator actions.
Axiom (Coming Soon)
Get notified when we ship real operational assets (runbooks, templates, benchmarks), not generic tutorials.
Key takeaways
Bans are usually a symptom of uncontrolled client behavior.
Default assumptions during incidents:
- traffic is bursty
- retries are synchronized
- auth failures are repeating
Fix the system, not the narrative:
- bound concurrency per endpoint
- treat 429 as backpressure with jittered backoff
- never retry auth failures
- add breakers and a kill switch
- log the fields that let you classify fast
For more production-focused work, see the Crypto Automation category.
Recommended resources
Download the shipped checklist/templates for this post.
Two files: an API key permissions checklist plus retry/backoff safe defaults for exchange clients. Download includes both markdown files.
resource
Related posts

Trading bot keeps getting 429s after deploy: stop rate limit storms
When deploys trigger 429 storms: why synchronized restarts amplify rate limits, how to diagnose fixed window vs leaky bucket, and guardrails that stop repeat incidents.

Agent keeps calling same tool: why autonomous agents loop forever in production
When agent loops burn tokens calling same tool repeatedly and cost spikes: why autonomous agents loop without stop rules, and the guardrails that prevent repeat execution and duplicate side effects.

Signature invalid but bot was working: why clock drift breaks auth suddenly
When bot gets signature invalid or 401 after working fine for hours: why clock drift breaks exchange auth suddenly, and the time calibration that prevents it.