Crash Recovery: Reconciliation Loops That Prevent Double Orders

Feb 23, 20263 min read

Share|

Category:AutomationCrypto

Crash Recovery: Reconciliation Loops That Prevent Double Orders

Build crash-proof trading bots with reconciliation loops that detect and correct out-of-sync state on restart—preventing double orders and orphan positions.

Free download: Crash Recovery Reconciliation Kit. Jump to the download section.

Your trading bot crashed at 3 AM. When it restarts, it doesn't know if that last order went through. Place it again? Maybe. But if the previous order succeeded, you've just doubled your position size.

This is the crash recovery problem. Solved by reconciliation loops—recovery routines that compare local state to exchange reality and fix any drift before resuming trading.

If you only do three things
  • Run reconciliation on every startup before enabling trading.
  • Detect all three failure modes: orphan orders, ghost orders, and stale fills.
  • Trust the exchange as source of truth. Your local state is just a cache.

Fast Triage: Recovery Pattern Selection

ScenarioPatternComplexityRecovery Time
Order sent, ack unknownIdempotency key lookupLow< 100ms
Bot crashed mid-orderOrder status reconciliationLow200-500ms
Position drift detectedPosition reconciliationMedium500ms-2s
Fill notifications missedFill backfill loopMedium1-5s
Full state corruptionComplete state rebuildHigh10-30s

Start with order status reconciliation. It handles 80% of crash scenarios.

The Three Failure Modes

Every crash creates one of three state mismatches:

1. Orphan Orders

What happened: Bot crashed after order reached exchange but before recording locally.

Risk: Double orders on restart (you place again, now have 2).

Detection:

typescript
async function findOrphanOrders(exchange, localOrderIds: Set<string>) {
  const exchangeOrders = await exchange.fetchOpenOrders();
  return exchangeOrders.filter(o => !localOrderIds.has(o.clientOrderId));
}

2. Ghost Orders

What happened: Bot recorded order locally but request never reached exchange.

Risk: Bot thinks position is X, but it's actually Y.

Detection:

typescript
async function findGhostOrders(exchange, localOrders: Order[]) {
  const ghostOrders: Order[] = [];
  for (const local of localOrders) {
    const remote = await exchange.fetchOrder(local.clientOrderId);
    if (!remote || remote.status === 'not_found') {
      ghostOrders.push(local);
    }
  }
  return ghostOrders;
}

3. Stale Fills

What happened: Order filled on exchange, but fill notification never processed.

Risk: Position tracking completely wrong, risk calculations invalid.

Detection:

typescript
async function findStaleFills(exchange, localOrders: Order[]) {
  const staleFills: Order[] = [];
  for (const local of localOrders) {
    if (local.status !== 'filled') {
      const remote = await exchange.fetchOrder(local.clientOrderId);
      if (remote?.status === 'filled') {
        staleFills.push({ local, remote });
      }
    }
  }
  return staleFills;
}

The Reconciliation Loop

Run this on every startup, before any new trading activity:

typescript
async function reconcileOnStartup(
  exchange: Exchange,
  state: TradingState
): Promise<ReconciliationResult> {
  const result: ReconciliationResult = {
    orphansFound: 0,
    ghostsRemoved: 0,
    fillsBackfilled: 0,
    positionCorrected: false,
  };
 
  // Phase 1: Detect orphan orders (exchange has, we don't)
  const orphans = await findOrphanOrders(exchange, state.orderIds);
  for (const orphan of orphans) {
    // Option A: Cancel if strategy no longer valid
    // Option B: Adopt and track
    await state.adoptOrder(orphan);
    result.orphansFound++;
  }
 
  // Phase 2: Remove ghost orders (we have, exchange doesn't)
  const ghosts = await findGhostOrders(exchange, state.openOrders);
  for (const ghost of ghosts) {
    await state.removeOrder(ghost.clientOrderId);
    result.ghostsRemoved++;
  }
 
  // Phase 3: Backfill stale fills
  const staleFills = await findStaleFills(exchange, state.openOrders);
  for (const { local, remote } of staleFills) {
    await state.processFill(local.clientOrderId, remote.filled, remote.price);
    result.fillsBackfilled++;
  }
 
  // Phase 4: Verify position accuracy
  const exchangePosition = await exchange.fetchPosition(state.symbol);
  if (Math.abs(exchangePosition.size - state.position.size) > 0.0001) {
    state.position.size = exchangePosition.size;
    result.positionCorrected = true;
  }
 
  return result;
}

Idempotency Keys Are Your Safety Net

Order reconciliation depends on client order IDs (idempotency keys). Without them, you can't ask the exchange "did this specific order go through?"

typescript
function generateClientOrderId(
  strategy: string,
  symbol: string,
  timestamp: number
): string {
  // Deterministic: same inputs always produce same ID
  // Unique: different orders get different IDs
  return `${strategy}-${symbol}-${timestamp}-${randomSuffix()}`;
}

Every order must include this ID. Most exchanges support it:

ExchangeField NameMax Length
BinancenewClientOrderId36 chars
BybitorderLinkId36 chars
OKXclOrdId32 chars
Krakenuserref32-bit int

Check your exchange's API docs for the exact field.

Position Reconciliation: The Final Check

Order-level reconciliation handles individual order drift. But position can still be wrong if:

  • Fills came through WebSocket that crashed
  • Manual trades were placed outside the bot
  • Exchange corrections adjusted fills retroactively

Always reconcile position as the final step:

typescript
async function reconcilePosition(
  exchange: Exchange,
  state: TradingState
): Promise<void> {
  const remote = await exchange.fetchPosition(state.symbol);
  const local = state.position;
 
  const driftSize = Math.abs(remote.size - local.size);
  const driftPct = (driftSize / Math.abs(local.size || 1)) * 100;
 
  if (driftPct > 0.1) { // More than 0.1% drift
    console.warn(`Position drift detected: local=${local.size}, remote=${remote.size}`);
    
    // Update local state to match exchange reality
    state.position.size = remote.size;
    state.position.entryPrice = remote.entryPrice;
    
    // Log for audit
    await auditLog({
      event: 'position_reconciled',
      drift: driftSize,
      localBefore: local.size,
      remoteTruth: remote.size,
    });
  }
}

Exchange always wins. Your local state is just a cache.

Startup Sequence: Order Matters

  1. Load local state from disk/database
  2. Run reconciliation loop (the code above)
  3. Update risk limits based on corrected position
  4. Resume WebSocket streams for live updates
  5. Enable trading only after steps 1-4 complete
typescript
async function startupSequence(config: BotConfig): Promise<void> {
  // 1. Load state
  const state = await loadState(config.stateFile);
  
  // 2. Create exchange connection (but don't trade yet)
  const exchange = createExchange(config, { tradingEnabled: false });
  
  // 3. Reconcile
  const reconciled = await reconcileOnStartup(exchange, state);
  console.log('Reconciliation complete:', reconciled);
  
  // 4. Verify risk is within limits post-reconciliation
  if (!isWithinRiskLimits(state.position, config.limits)) {
    throw new Error('Position exceeds risk limits after reconciliation');
  }
  
  // 5. Enable trading
  exchange.enableTrading();
  
  // 6. Connect live data
  await exchange.connectWebSocket();
  
  console.log('Trading resumed');
}

Never skip step 4. A position that drifted during downtime might exceed your risk limits.

Shipped asset: crash recovery reconciliation kit

Download
Free

Crash Recovery Reconciliation Kit

TypeScript reconciliation loop template and startup sequence checklist for trading bots. Detects orphan orders, ghost orders, and stale fills on every restart.

When to use this (fit check)
  • Your bot stores order state locally.
  • You run strategies where a crash would make state ambiguous.
  • Position accuracy is critical (it always is).
When NOT to use this (yet)
  • Stateless order placement (fire and forget).
  • You query exchange state on every decision anyway.
  • Development or paper trading only.

Included files:

  • reconciliation-loop-template.ts - Full TypeScript implementation
  • startup-sequence-checklist.md - Step-by-step startup verification
  • README.md - Integration guide

Error Handling: When Reconciliation Fails

Reconciliation itself can fail. Handle these cases:

typescript
async function safeReconciliation(
  exchange: Exchange,
  state: TradingState
): Promise<ReconciliationResult | null> {
  try {
    return await reconcileOnStartup(exchange, state);
  } catch (error) {
    if (error.code === 'RATE_LIMITED') {
      // Wait and retry
      await delay(60_000);
      return await reconcileOnStartup(exchange, state);
    }
    
    if (error.code === 'EXCHANGE_DOWN') {
      // Cannot reconcile—refuse to start
      console.error('Cannot reconcile: exchange unreachable');
      process.exit(1);
    }
    
    // Unknown error—refuse to trade
    console.error('Reconciliation failed:', error);
    return null;
  }
}

If reconciliation fails, don't trade. Operating with unknown state is how you blow up accounts.

Scheduled Reconciliation: Not Just Startup

Run reconciliation periodically during operation too:

typescript
// Run light reconciliation every 5 minutes
setInterval(async () => {
  const drift = await checkPositionDrift(exchange, state);
  if (drift > config.driftThreshold) {
    await reconcilePosition(exchange, state);
  }
}, 5 * 60 * 1000);

This catches drift from:

  • WebSocket message loss
  • Network partitions you didn't notice
  • Exchange corrections

Checklist (copy/paste)

Idempotency setup:

  • All orders include client order ID
  • IDs are deterministic (reproducible from order params)
  • IDs are stored before order placed

Reconciliation loop:

  • Orphan order detection implemented
  • Ghost order cleanup implemented
  • Stale fill backfill implemented
  • Position reconciliation as final step

Startup sequence:

  • State loaded before exchange connection
  • Reconciliation completes before trading enabled
  • Risk limits checked after reconciliation
  • WebSocket connected after reconciliation

Periodic maintenance:

  • Position drift checked every N minutes
  • Full reconciliation on any WebSocket reconnect
  • Drift logged for monitoring

Failure handling:

  • Reconciliation timeout defined
  • Exchange-down scenario handled
  • Manual intervention trigger defined

Under 10 seconds for most accounts. If you have hundreds of open orders, it might take longer due to rate limits. Batch your order queries and respect exchange rate limits. If reconciliation consistently takes more than 30 seconds, you likely have too many open orders.

Exchange state is the source of truth for positions and orders. Your local state might track things the exchange doesn't (strategy metadata, alerts, etc.), but for anything the exchange knows about, trust the exchange. If the exchange is wrong, that's a support ticket, not a code fix.

Depends on your strategy's time horizon. If orders are good for seconds (scalping), cancel orphans—the opportunity passed. If orders are good for hours (swing), adopt them and let strategy logic decide. Default to adoption with position recalculation.

Yes. WebSocket connections drop. Messages get lost. TCP doesn't guarantee delivery order. Reconciliation catches what WebSocket missed. Think of WebSocket as fast path, REST reconciliation as verification path.


Resources

Coming soon

Axiom is coming

Join the waitlist and get notified when we ship real, operational tooling (not tutorials).

Recommended resources

Download the shipped checklist/templates for this post.

Reconciliation loop template for trading bots—detect and correct state drift on startup to prevent double orders and orphan positions.

resource

Related posts