WebSocket closed with 1006: why trading bots lose connection without an error code

Jun 06, 20268 min read

Share|

Category:AutomationCrypto

WebSocket closed with 1006: why trading bots lose connection without an error code

When WebSocket drops with 1006 abnormal closure and no close frame: why trading bots see 1006 instead of a clean close, and the reconnect strategy that handles it without guessing.

Free download: WebSocket Reconnection Kit. Jump to the download section.

A WebSocket close code 1006 means your connection died without sending a close frame.

It is not a deliberate close. The exchange did not send 1000 (normal) or 1001 (going away). The connection simply went away—TCP reset, proxy timeout, load balancer kill, network partition, or the peer process crashed without cleaning up its socket.

In trading bots, 1006 is the most common disconnect signal after idle periods. It is also the hardest to debug because there is no error message, no reason string, and no close frame payload.

If you are running bots, treat 1006 as a symptom of infrastructure, not a WebSocket problem. The fix is not trying to catch it. The fix is making reconnect safe regardless of the close reason.

This post sits in the Crypto Automation hub and the Crypto Automation category.
If you only do three things
  • Treat 1006 as a reconnect signal, not an error to log and panic about. Your websocket library already handles it.
  • Add application-level heartbeat (ping/pong) inside your websocket, not just TCP keepalive. TCP keepalive is too slow to detect dead connections.
  • Log the gap between last received message and disconnect time. That tells you whether it was an idle timeout or a sudden drop.

Fast triage table (what to check first)

SymptomLikely causeConfirm fastFirst safe move
1006 every 15-30 minutes of idle timeProxy or load balancer idle timeoutLog gap between last message and disconnect ≈ 15-30 minAdd application-level ping every 10-15 seconds
1006 after deploy or scale eventConnection count exceeded per-IP or per-instance limitCheck exchange docs for max connections per IP; compare instance countStagger websocket connections; add jitter to startup timing
1006 with no pattern, random timesNetwork instability or dropped packetsCheck ping loss %, TCP retransmits, tracerouteEnsure reconnect logic is singleflight + jittered; add circuit breaker for persistent failures
1006 every few seconds, reconnect loopsRate limiting on reconnect attemptsServer closes with nothing; rate of reconnect attempts spikesAdd backoff and jitter to reconnect; cap reconnect attempts per minute
1006 on private channel onlyAuth token expired mid-sessionReconnect after 1006 succeeds but private channel doesn't deliverRe-authenticate on reconnect; check token TTL vs session lifetime

What code 1006 actually means

The WebSocket protocol defines 1006 as a reserved close code that MUST NOT be sent in a close frame. You will never see 1006 in an onClose frame payload. You see 1006 because your library detects that the TCP connection was lost without receiving a WebSocket close frame.

In code it looks like:

code
WebSocket closed with code 1006. No close frame received.

That is not a bug in your library. It is the library telling you precisely what happened: the connection disappeared.

Common root causes:

  • Idle proxy timeout: load balancers, reverse proxies, and cloud NAT gateways kill idle TCP connections after a timeout (often 30-120 seconds). The peer does not send a close frame first—it just drops the connection.
  • Connection reset by peer: the remote process crashed, restarted, or ran out of file descriptors. The OS sends a TCP RST, which maps to 1006.
  • Network partition: a transient network failure drops packets long enough for the TCP session to die.
  • Rate limiting: some exchanges silently drop websocket connections that reconnect too aggressively. The client sees 1006 because there is no close frame.

Why 1006 is dangerous for trading bots

The danger is not the disconnect. The danger is what your bot does after.

Bots that handle 1006 badly do one of these:

  1. Reconnect immediately at full speed — creates a reconnect storm. If the drop was caused by a proxy timeout, you reconnect into the same proxy that will drop you again. If the drop was rate limiting, aggressive reconnect makes it worse.
  2. Log an error and give up — the bot goes silent. You learn about it when the price moves and your orders are stale.
  3. Reconnect without resync — you re-establish the connection but miss messages that arrived during the gap. Your local order book or position state drifts.

Each of these is a self-inflicted wound. The WebSocket dropped. The exchange probably kept running. Your bot should reconnect safely and recover gracefully.


What NOT to do on 1006

Do not try to "fix" 1006 at the WebSocket level

You cannot prevent 1006. It is a transport-level signal that means the connection disappeared. The only fix is at the infrastructure or application layer: shorter ping intervals, stable network paths, and reconnect logic.

Do not log 1006 as a critical error

Logging 1006 as ERROR creates noise. Your logs fill up with "websocket disconnected" entries that tell you nothing useful. Instead, log the pattern: frequency of disconnects over time, message gap, reconnect success rate.

Do not assume 1006 means the exchange is down

In most cases, the exchange is fine. Your connection dropped for infrastructure reasons. Reconnect and resume.


How to handle 1006 correctly

1) Application-level heartbeat (ping/pong)

TCP keepalive defaults are 2 hours on many systems. That is useless for detecting dead websocket connections. You need application-level ping/pong frames inside the WebSocket protocol.

Send a ping every 10-15 seconds. If you do not get a pong within 5-10 seconds, close the connection and reconnect. This lets you detect dead connections before they become a 1006 surprise.

Most exchange websocket APIs already send pings. If they do not, send your own.

2) Singleflight reconnect with jittered backoff

When 1006 fires, only one instance should attempt reconnect. Multiple instances reconnecting simultaneously amplify the problem.

code
Reconnect sequence:
1. Wait base_delay (1-2 seconds)
2. Add jitter (±500ms)
3. Attempt reconnect
4. On failure: double base_delay, add jitter, retry
5. Cap max delay at 30-60 seconds
6. Cap total reconnect attempts per session

3) Bounded resync on reconnect

After reconnecting, do not fetch full state. Fetch only what changed during the gap. For trading bots:

  • Subscribe to channels
  • For private channels, re-authenticate
  • Check last known sequence number vs current state
  • If gap is small (seconds), request deltas
  • If gap is large (minutes or more), do a full state sync

4) Circuit breaker for persistent 1006

If your bot disconnects with 1006 more than N times in M minutes, open a circuit breaker. Stop reconnecting. Escalate to an operator. Persistent 1006 with reconnect loops is worse than staying disconnected and waiting for human intervention.


What to log

If you cannot answer these questions after a 1006 event, you are guessing:

  • How long between last received message and disconnect? (idle timeout vs sudden drop)
  • How many reconnects in the last hour? (reconnect storm?)
  • Was the reconnect successful? How long did it take?
  • Did the bot miss messages? How many?

Log per disconnect:

  • ts
  • bot_instance_id
  • exchange
  • channel (public or private)
  • close_code (1006, 1000, 1001, etc.)
  • last_message_ago_ms (time since last received message)
  • uptime_seconds (how long the connection was alive)
  • reconnect_attempt
  • reconnect_success
  • reconnect_latency_ms
  • missed_messages_count (estimated from sequence gap)

Shipped asset

Download
Free

WebSocket reconnection kit: safe reconnect + state recovery

Singleflight reconnect with jittered backoff, message gap detection, and state recovery templates for trading bots.

When to use this (fit check)
  • Your bot gets 1006 disconnects and you need predictable reconnect behavior.
  • You want bounded resync (deltas, not full state) after reconnect.
  • You run multiple instances and need to prevent reconnect storms.
When NOT to use this (yet)
  • You don't log message sequence numbers and cannot detect gaps.
  • You place orders synchronously on websocket events without idempotency.

What you get (2 files):

  • websocket-reconnect-logic.md: singleflight reconnect with jittered backoff
  • state-recovery-templates.md: bounded resync for order books, positions, and private channels

Resources


Checklist (copy/paste)

  • Application-level ping/pong is implemented (TCP keepalive is not enough).
  • 1006 is treated as a reconnect signal, not a critical error.
  • Reconnect is singleflight: only one instance attempts reconnect at a time.
  • Reconnect uses jittered backoff (1-2s base, 30-60s cap).
  • Reconnect attempts are bounded per session (circuit breaker after N failures).
  • Resync after reconnect is bounded: deltas only, not full state.
  • Message sequence numbers are logged to detect gaps.
  • Disconnect events log: last_message_ago_ms, uptime_seconds, reconnect_attempt, reconnect_success.

Key takeaways

  • Code 1006 means the connection died without a close frame. It's a transport signal, not a WebSocket error.
  • You cannot prevent 1006 at the application level. The fix is infrastructure (shorter ping intervals, stable network) and safe reconnect logic.
  • Application-level heartbeat (ping/pong every 10-15s) detects dead connections faster than TCP keepalive.
  • Singleflight reconnect with jittered backoff prevents reconnect storms.
  • Resync after reconnect should be bounded: only fetch what changed during the gap.
  • Log the pattern, not the event. Track disconnect frequency, message gap, and reconnect success rate over time.

Recommended resources

Download the shipped checklist/templates for this post.

WebSocket manager template with automatic reconnection, gap detection, and state recovery for trading bots.

resource

Related posts

Next step

Exchange API reliability, rate limiting, timestamp drift, and bot architecture patterns.

Explore Crypto Automation →