Feb 23, 20264 min read

Share |

Crash Recovery: Reconciliation Loops That Prevent Double Orders

Build crash-proof trading bots with reconciliation loops that detect and correct out-of-sync state on restart—preventing double orders and orphan positions.

Free download: Crash Recovery Reconciliation Kit. Jump to the download section.

Your trading bot crashed at 3 AM. When it restarts, it doesn't know if that last order went through. Place it again? Maybe. But if the previous order succeeded, you've just doubled your position size.

This is the crash recovery problem. Solved by reconciliation loops—recovery routines that compare local state to exchange reality and fix any drift before resuming trading.

This post is in the Crypto Automation hub and the Crypto Automation category.

If you only do three things

Run reconciliation on every startup before enabling trading.
Detect all three failure modes: orphan orders, ghost orders, and stale fills.
Trust the exchange as source of truth. Your local state is just a cache.

Fast Triage: Recovery Pattern Selection

Scenario	Pattern	Complexity	Recovery Time
Order sent, ack unknown	Idempotency key lookup	Low	< 100ms
Bot crashed mid-order	Order status reconciliation	Low	200-500ms
Position drift detected	Position reconciliation	Medium	500ms-2s
Fill notifications missed	Fill backfill loop	Medium	1-5s
Full state corruption	Complete state rebuild	High	10-30s

Start with order status reconciliation. It handles 80% of crash scenarios.

The Three Failure Modes

Every crash creates one of three state mismatches:

1. Orphan Orders

What happened: Bot crashed after order reached exchange but before recording locally.

Risk: Double orders on restart (you place again, now have 2).

Detection:

typescript

async function findOrphanOrders(exchange, localOrderIds: Set<string>) {
  const exchangeOrders = await exchange.fetchOpenOrders();
  return exchangeOrders.filter(o => !localOrderIds.has(o.clientOrderId));
}

2. Ghost Orders

What happened: Bot recorded order locally but request never reached exchange.

Risk: Bot thinks position is X, but it's actually Y.

Detection:

typescript

async function findGhostOrders(exchange, localOrders: Order[]) {
  const ghostOrders: Order[] = [];
  for (const local of localOrders) {
    const remote = await exchange.fetchOrder(local.clientOrderId);
    if (!remote || remote.status === 'not_found') {
      ghostOrders.push(local);
    }
  }
  return ghostOrders;
}

3. Stale Fills

What happened: Order filled on exchange, but fill notification never processed.

Risk: Position tracking completely wrong, risk calculations invalid.

Detection:

typescript

async function findStaleFills(exchange, localOrders: Order[]) {
  const staleFills: Order[] = [];
  for (const local of localOrders) {
    if (local.status !== 'filled') {
      const remote = await exchange.fetchOrder(local.clientOrderId);
      if (remote?.status === 'filled') {
        staleFills.push({ local, remote });
      }
    }
  }
  return staleFills;
}

The Reconciliation Loop

Run this on every startup, before any new trading activity:

typescript

async function reconcileOnStartup(
  exchange: Exchange,
  state: TradingState
): Promise<ReconciliationResult> {
  const result: ReconciliationResult = {
    orphansFound: 0,
    ghostsRemoved: 0,
    fillsBackfilled: 0,
    positionCorrected: false,
  };
 
  // Phase 1: Detect orphan orders (exchange has, we don't)
  const orphans = await findOrphanOrders(exchange, state.orderIds);
  for (const orphan of orphans) {
    // Option A: Cancel if strategy no longer valid
    // Option B: Adopt and track
    await state.adoptOrder(orphan);
    result.orphansFound++;
  }
 
  // Phase 2: Remove ghost orders (we have, exchange doesn't)
  const ghosts = await findGhostOrders(exchange, state.openOrders);
  for (const ghost of ghosts) {
    await state.removeOrder(ghost.clientOrderId);
    result.ghostsRemoved++;
  }
 
  // Phase 3: Backfill stale fills
  const staleFills = await findStaleFills(exchange, state.openOrders);
  for (const { local, remote } of staleFills) {
    await state.processFill(local.clientOrderId, remote.filled, remote.price);
    result.fillsBackfilled++;
  }
 
  // Phase 4: Verify position accuracy
  const exchangePosition = await exchange.fetchPosition(state.symbol);
  if (Math.abs(exchangePosition.size - state.position.size) > 0.0001) {
    state.position.size = exchangePosition.size;
    result.positionCorrected = true;
  }
 
  return result;
}

Idempotency Keys Are Your Safety Net

Order reconciliation depends on client order IDs (idempotency keys). Without them, you can't ask the exchange "did this specific order go through?"

typescript

function generateClientOrderId(
  strategy: string,
  symbol: string,
  timestamp: number
): string {
  // Deterministic: same inputs always produce same ID
  // Unique: different orders get different IDs
  return `${strategy}-${symbol}-${timestamp}-${randomSuffix()}`;
}

Every order must include this ID. Most exchanges support it:

Exchange	Field Name	Max Length
Binance	newClientOrderId	36 chars
Bybit	orderLinkId	36 chars
OKX	clOrdId	32 chars
Kraken	userref	32-bit int

Check your exchange's API docs for the exact field.

Position Reconciliation: The Final Check

Order-level reconciliation handles individual order drift. But position can still be wrong if:

Fills came through WebSocket that crashed
Manual trades were placed outside the bot
Exchange corrections adjusted fills retroactively

Always reconcile position as the final step:

typescript

async function reconcilePosition(
  exchange: Exchange,
  state: TradingState
): Promise<void> {
  const remote = await exchange.fetchPosition(state.symbol);
  const local = state.position;
 
  const driftSize = Math.abs(remote.size - local.size);
  const driftPct = (driftSize / Math.abs(local.size || 1)) * 100;
 
  if (driftPct > 0.1) { // More than 0.1% drift
    console.warn(`Position drift detected: local=${local.size}, remote=${remote.size}`);
    
    // Update local state to match exchange reality
    state.position.size = remote.size;
    state.position.entryPrice = remote.entryPrice;
    
    // Log for audit
    await auditLog({
      event: 'position_reconciled',
      drift: driftSize,
      localBefore: local.size,
      remoteTruth: remote.size,
    });
  }
}

Exchange always wins. Your local state is just a cache.

Startup Sequence: Order Matters

Load local state from disk/database
Run reconciliation loop (the code above)
Update risk limits based on corrected position
Resume WebSocket streams for live updates
Enable trading only after steps 1-4 complete

typescript

async function startupSequence(config: BotConfig): Promise<void> {
  // 1. Load state
  const state = await loadState(config.stateFile);
  
  // 2. Create exchange connection (but don't trade yet)
  const exchange = createExchange(config, { tradingEnabled: false });
  
  // 3. Reconcile
  const reconciled = await reconcileOnStartup(exchange, state);
  console.log('Reconciliation complete:', reconciled);
  
  // 4. Verify risk is within limits post-reconciliation
  if (!isWithinRiskLimits(state.position, config.limits)) {
    throw new Error('Position exceeds risk limits after reconciliation');
  }
  
  // 5. Enable trading
  exchange.enableTrading();
  
  // 6. Connect live data
  await exchange.connectWebSocket();
  
  console.log('Trading resumed');
}

Never skip step 4. A position that drifted during downtime might exceed your risk limits.

Shipped asset: crash recovery reconciliation kit

Download

Free

Crash Recovery Reconciliation Kit

TypeScript reconciliation loop template and startup sequence checklist for trading bots. Detects orphan orders, ghost orders, and stale fills on every restart.

Get the kit

When to use this (fit check)

Your bot stores order state locally.
You run strategies where a crash would make state ambiguous.
Position accuracy is critical (it always is).

When NOT to use this (yet)

Stateless order placement (fire and forget).
You query exchange state on every decision anyway.
Development or paper trading only.

Included files:

reconciliation-loop-template.ts - Full TypeScript implementation
startup-sequence-checklist.md - Step-by-step startup verification
README.md - Integration guide

Error Handling: When Reconciliation Fails

Reconciliation itself can fail. Handle these cases:

typescript

async function safeReconciliation(
  exchange: Exchange,
  state: TradingState
): Promise<ReconciliationResult | null> {
  try {
    return await reconcileOnStartup(exchange, state);
  } catch (error) {
    if (error.code === 'RATE_LIMITED') {
      // Wait and retry
      await delay(60_000);
      return await reconcileOnStartup(exchange, state);
    }
    
    if (error.code === 'EXCHANGE_DOWN') {
      // Cannot reconcile—refuse to start
      console.error('Cannot reconcile: exchange unreachable');
      process.exit(1);
    }
    
    // Unknown error—refuse to trade
    console.error('Reconciliation failed:', error);
    return null;
  }
}

If reconciliation fails, don't trade. Operating with unknown state is how you blow up accounts.

Scheduled Reconciliation: Not Just Startup

Run reconciliation periodically during operation too:

typescript

// Run light reconciliation every 5 minutes
setInterval(async () => {
  const drift = await checkPositionDrift(exchange, state);
  if (drift > config.driftThreshold) {
    await reconcilePosition(exchange, state);
  }
}, 5 * 60 * 1000);

This catches drift from:

WebSocket message loss
Network partitions you didn't notice
Exchange corrections

Checklist (copy/paste)

Idempotency setup:

All orders include client order ID
IDs are deterministic (reproducible from order params)
IDs are stored before order placed

Reconciliation loop:

Orphan order detection implemented
Ghost order cleanup implemented
Stale fill backfill implemented
Position reconciliation as final step

Startup sequence:

State loaded before exchange connection
Reconciliation completes before trading enabled
Risk limits checked after reconciliation
WebSocket connected after reconciliation

Periodic maintenance:

Position drift checked every N minutes
Full reconciliation on any WebSocket reconnect
Drift logged for monitoring

Failure handling:

Reconciliation timeout defined
Exchange-down scenario handled
Manual intervention trigger defined

Under 10 seconds for most accounts. If you have hundreds of open orders, it might take longer due to rate limits. Batch your order queries and respect exchange rate limits. If reconciliation consistently takes more than 30 seconds, you likely have too many open orders.

Exchange state is the source of truth for positions and orders. Your local state might track things the exchange doesn't (strategy metadata, alerts, etc.), but for anything the exchange knows about, trust the exchange. If the exchange is wrong, that's a support ticket, not a code fix.

Depends on your strategy's time horizon. If orders are good for seconds (scalping), cancel orphans—the opportunity passed. If orders are good for hours (swing), adopt them and let strategy logic decide. Default to adoption with position recalculation.

Yes. WebSocket connections drop. Messages get lost. TCP doesn't guarantee delivery order. Reconciliation catches what WebSocket missed. Think of WebSocket as fast path, REST reconciliation as verification path.

🛠 Free Tools

Trading Bot Health Score — Assess your bot's reliability across 12 key metrics: connectivity, order flow, error handling, and risk controls. Get a health grade (A–F) and prioritized fix list.
Idempotency Key Designer — Generate idempotency contract templates for APIs with UUID, ULID, or NanoID.
Incident Runbook Builder — Build structured incident runbooks with decision trees and escalation paths.

Build a complete trading bot

Need crash recovery, WebSocket reconnection, rate limiting, and 14 trading strategies in one package? AlgoTrak is a production-grade Python trading bot with 5 exchange integrations, risk management module, 51-test suite, and Docker deployment. One-time purchase, full source code.

Resources

Coming soon

Axiom is coming

Join the waitlist and get notified when we ship real, operational tooling (not tutorials).

Join waitlist

Recommended resources

Download the shipped checklist/templates for this post.

Crash Recovery Reconciliation KitFree

Reconciliation loop template for trading bots—detect and correct state drift on startup to prevent double orders and orphan positions.

resource

Automation > CryptoFeb 25, 2026

WebSocket Reconnect & Auto-Reconnection for Trading Bots: Exponential Backoff, Heartbeat & State Recovery

Complete WebSocket auto-reconnect guide for trading bots. Implement automatic reconnection with exponential backoff, heartbeat ping-pong, message gap detection, and state recovery. Production-tested TypeScript code included.

Automation > AgentsJan 16, 2026

How to Stop AI Agents from Looping Forever: Guardrails & Stop Rules

Stop AI agents from calling the same tool repeatedly in production. Learn retry budgets, loop detection, human-in-the-loop escalation, idempotency keys, and guardrails that prevent runaway token costs and duplicate side effects. Code examples included.

Automation > CryptoApr 23, 2025

Python Crypto Trading Bot with GUI: Complete Source Code & Step-by-Step Guide

Build a Python crypto trading bot with Binance API, backtesting, paper trading, and a real user interface. Step-by-step source code guide with exchange error handling, WebSocket reconnection, and production reliability patterns. Deploy to any VPS.

Related error playbooks

See also: failure patterns and runbooks linked to this article.

Crypto Automation·websocket disconnects trading bot, bot reconnects but misses events

Next step

Exchange API reliability, rate limiting, timestamp drift, and bot architecture patterns.

Explore Crypto Automation →

Crash Recovery: Reconciliation Loops That Prevent Double Orders

Fast Triage: Recovery Pattern Selection

The Three Failure Modes

1. Orphan Orders

2. Ghost Orders

3. Stale Fills

The Reconciliation Loop

Idempotency Keys Are Your Safety Net

Position Reconciliation: The Final Check

Startup Sequence: Order Matters

Shipped asset: crash recovery reconciliation kit

Crash Recovery Reconciliation Kit

Error Handling: When Reconciliation Fails

Scheduled Reconciliation: Not Just Startup

Checklist (copy/paste)

🛠 Free Tools

Build a complete trading bot

Resources

Axiom is coming

Recommended resources

Related posts

WebSocket Reconnect & Auto-Reconnection for Trading Bots: Exponential Backoff, Heartbeat & State Recovery

How to Stop AI Agents from Looping Forever: Guardrails & Stop Rules

Python Crypto Trading Bot with GUI: Complete Source Code & Step-by-Step Guide

Related error playbooks

WebSocket disconnects cause stale bot state

Database write succeeded but event was never published

Bot restart causes duplicate orders or orphan state

Next step