
Jan 31, 20268 min read
Category:AutomationCrypto
Crypto exchange rate limiting: fixed window vs leaky bucket (stop 429s)
A production-first playbook to stop 429 storms: diagnose the limiter type, add guardrails, and log the signals you need to stop guessing.
Download available. Jump to the shipped asset.
429 is not a glitch. In production it becomes a retry storm: orders fail, your bot misses fills, and your deploy window turns into an incident because five instances all hit the same rate limit at once.
This is not a tutorial. It is a playbook for operators running trading bots against exchange APIs. You will leave with a decision framework, stop rules, and logging fields that let you prove you fixed the problem.
This post is in the Crypto Automation hub and the Crypto Automation category.
Mini incident: the 429 storm after deploy
It is 14:03 UTC and a deploy finishes. Five bot instances restart, each does a full resync, and each strategy starts polling the same endpoints.
At 14:04 UTC, you see clusters of 429 responses. By 14:05 UTC, retries are synchronized and the bot is spending more capacity retrying than trading. By 14:07 UTC, the exchange escalates and you start seeing longer cooloffs.
Nothing is "down". Your system is. Rate limiting is backpressure, and your client behavior decides whether backpressure is a small slowdown or a full incident.
What fixed window and leaky bucket really change
Most advice treats "rate limiting" as one thing. It is not. The limiter model affects what failure looks like, how you should pace requests, and what signals confirm progress.
Two common models show up in exchange APIs:
- Fixed window counters. You get a budget for a window, like 1200 weight per 60 seconds. When you hit the cap, you get hard 429s until the window resets.
- Leaky bucket style pacing. Requests drain at a steady rate. Bursts get rejected or delayed, and constant pacing tends to succeed.
The operational difference is the pattern of 429s. Fixed window tends to produce sharp bursts. Leaky bucket tends to spread failures across time as burst pressure drains.
Diagnosis ladder (fast checks first)
Do these in order. The goal is to identify whether you are over budget, misclassifying errors, or amplifying retries.
- Confirm it is truly 429. Some exchanges embed rate limiting in a JSON error body or custom code even when the HTTP status is 200.
- Capture response headers. Log
Retry-Afterand any vendor headers that expose remaining budget or reset time. - Identify the budget key. Is it per IP, per API key, per account, or per endpoint group. This determines whether scaling out helps or hurts.
- Measure request weight, not request count. If the exchange uses weights, a single call can cost 5-20 units.
- Compare patterns over time. A cluster at the top of the minute suggests fixed window. A smoother bleed suggests leaky bucket or server-side queueing.
- Check concurrency after deploy. Instance count, reconnect logic, and "catch-up" jobs are the usual source of surprise bursts.
If you cannot answer budget key + weight + concurrency, you are still guessing.
Decision framework: what strategy matches the limiter
Do not pick a limiter strategy because it is popular. Pick it because it matches the exchange behavior and your bot architecture.
- If you see fixed window bursts, you need burst smoothing. Token bucket at the client edge is a good fit, but you must also coordinate across processes.
- If you see leaky bucket behavior, you need pacing. A steady queue with backpressure can eliminate most 429s without aggressive backoff.
- If the exchange returns
Retry-After, it is telling you the window. Your policy should follow it.
In both cases, your biggest risk is synchronized retry. A bot that retries in lockstep is effectively a self-inflicted denial of service.
Prevention plan: stop 429 from becoming a repeat incident
Your goal is not "never see 429". Your goal is "429 never triggers a retry storm".
1. Centralize rate limiting per exchange and credential
Do not let each caller own its own retry loop. The limiter should live in one place so it can enforce budgets, caps, and stop rules.
Partition by:
- exchange
- account or api key hash
- endpoint group (public, private, trading)
This prevents one noisy strategy from starving everything else.
2. Add a queue, then apply backpressure
If you have multiple strategies, you need a queue even if it is in-memory. The queue gives you a place to apply policy: limit concurrency, drop low value work, and prioritize trading over metrics.
Backpressure rules that work:
- hard cap concurrency per key
- enforce a minimum spacing between requests when budget is low
- reject or defer non-critical calls when remaining budget is under a threshold
3. Retry policy with jitter and stop rules
Retry is a tool, not a default.
Policy that usually holds up:
- 2-3 attempts max on 429
- exponential backoff with jitter
- respect
Retry-Afterwhen present - circuit break when consecutive 429 exceeds a threshold
Stop rules that keep you safe:
- If
Retry-Afteris large (example: 60+ seconds), enter degrade mode and stop trading actions. - If 429s persist across multiple windows, stop and page. You are not recovering, you are being rate limited by design.
4. Make deploy behavior boring
Most 429 incidents happen right after deploy.
Guardrails:
- random startup delay per instance
- singleflight resync (only one instance performs heavy catch-up)
- warm-up mode that ramps request rate over 2-5 minutes
5. Validate with a burst test
Before you ship changes, run a burst test and record the pattern.
Example procedure:
- send a short burst to a known endpoint group
- observe the shape of 429s
- confirm your limiter spreads retries and settles
Your acceptance criteria is not "no 429". It is "no synchronized retry and no prolonged cooloff".
What to log
If you cannot prove the limiter is working, the incident will repeat.
Log enough fields to answer:
- what budget did we exceed
- what did the limiter decide
- did retries synchronize
- did we respect exchange guidance
{
"ts": "2026-01-27T14:04:22.481Z",
"event": "exchange_rate_limit",
"exchange": "binance",
"account_key_hash": "k_7c9b...",
"endpoint": "/api/v3/order",
"endpoint_group": "trading",
"http_status": 429,
"retry_after_seconds": 5,
"request_weight": 1,
"window_type": "fixed",
"window_seconds": 60,
"limiter_decision": "delay_then_retry",
"attempt": 1,
"backoff_ms": 1200,
"jitter_ms": 430,
"next_retry_at": "2026-01-27T14:04:28Z",
"consecutive_429": 3,
"breaker_state": "closed",
"instance_id": "bot-03",
"request_id": "req_4f1f..."
}With this you can build simple dashboards:
- 429 rate by endpoint group
- consecutive 429 and breaker transitions
- retries per instance during deploy windows
Shipped asset: exchange rate limiting package
Exchange rate limiting package
Config template, decision checklist, and a 429 logging schema for trading bots. The download is a real local zip, not a placeholder.
This is intentionally compact here. Full package details are on the resource page.
Included files:
rate-limit-config-template.yamlrate-limit-decision-checklist.mdrate-limit-429-logging-schema.jsonREADME.md
Preview (config excerpt):
rate_limits:
binance:
trading:
limit: 1200
window_seconds: 60
window_type: fixed
retry_strategy: exponential_backoff
max_retries: 3
jitter_enabled: true
respect_retry_after: trueTradeoffs and failure modes to plan for
Rate limiting policy has costs. Naming them up front makes rollout safer.
- You will delay work. That is the point. But it can cause missed opportunities if you have no degrade mode.
- A strict limiter can hide upstream degradation by slowing everything. That is why breaker state and 429 rate must be visible.
- Multiple instances need coordination. If each instance has its own limiter, you can still exceed a shared IP budget.
The clean solution is boring: centralize policy, cap concurrency, add stop rules, and make deploy traffic predictable.
Resources
This is intentionally compact. Full package details are on the resource page.
- Exchange rate limiting package
- Crypto Automation hub
- Axiom (Coming Soon)
- Backoff + jitter: the simplest reliability win
- Exchange API bans: how to prevent them
External references:
FAQ
Look at the shape of 429s.
Fixed window often looks like clusters, then clean success right after a reset. Leaky bucket style limiting often looks like failures that spread across time as bursts drain.
Do not rely on intuition. Log headers, record timestamps, and compare patterns around deploy windows.
No. That is how 429 becomes a storm.
Centralize the retry policy and make concurrency part of the policy. If you retry with the same concurrency that caused the limit, you will keep hitting the limit.
Use your own backoff policy and make it visible.
Start with a conservative base delay, add jitter, cap attempts, then escalate to degrade mode if 429 continues.
Do not assume it helps.
Some exchanges count limits per account, not per key. Others use IP-based limits. If you get it wrong you can still be blocked, and you may violate terms.
Deploys change traffic shape.
Restarts create resync bursts, reconnect logic can stampede, and instance count can increase overnight. Put guardrails around startup and ramp request rate.
Not always.
The target is that 429 is contained. A single request might be delayed, but the bot keeps operating and the operator has clear signals.
Coming soon
If this kind of post is useful, the Axiom waitlist is where we ship operational templates (runbooks, decision trees, defaults) that keep automation out of incident mode.
Axiom (Coming Soon)
Get notified when we ship real operational assets (runbooks, templates, benchmarks), not generic tutorials.
Key takeaways
- 429 is backpressure. Your client decides whether it becomes an incident.
- Fixed window and leaky bucket produce different 429 patterns. Diagnose before you tune.
- Centralize rate limiting policy per exchange and credential.
- Add jitter and stop rules. Never retry in lockstep.
- Log limiter decisions so you can prove the fix.
Related posts

Why exchange APIs "randomly" ban bots (and how to prevent it)
A production-first playbook to avoid bans: permissions, rate limits, auth hygiene, and traffic patterns that keep trading bots alive.

Retry backoff and jitter: safe defaults to prevent retry storms
An incident-ready retry policy for production automation: stop rules, exponential backoff + jitter, caps, budgets, and the logs operators need.

Why agents loop forever (and how to stop it)
A production playbook for preventing infinite loops: bounded retries, stop conditions, error classification, and escalation that actually helps humans.