# Time Sync Runbook (Trading Bots)

Goal: eliminate timestamp drift as a cause of signature / auth errors.

This runbook assumes:

- Your bot signs requests (HMAC, JWT, etc.)
- The exchange enforces a strict timestamp window (often 1–5 seconds)
- Your infrastructure might include containers, VMs, or managed platforms

## 0) When to run this

Run this whenever you see any of:

- `timestamp out of range`
- `recvWindow` / `timestamp` / `nonce` errors
- intermittent `signature invalid` that “goes away” on redeploy
- sudden spike in 401/403 across private endpoints

## 1) Immediate safety actions

- Stop retries for auth/signature errors.
- Open an auth circuit breaker (fail closed).
- If your bot can trade, engage the kill switch.

Reason: repeated invalid auth looks like abuse and can escalate blocks.

## 2) Confirm the exchange’s time rules

Capture from docs:

- allowed drift window (seconds)
- whether they support `serverTime` endpoint
- whether they accept a configurable `recvWindow`

Rules of thumb:

- `recvWindow` is not a fix for large drift; it is a tolerance buffer.
- If the exchange provides `serverTime`, use it for calibration.

## 3) Verify system time on the host

On Windows Server:

- Check Time Service (`w32time`) and sync status.

On Linux:

- Check if you use `systemd-timesyncd`, `chrony`, or `ntpd`.

What you’re looking for:

- Is time sync enabled?
- Is the last sync recent?
- Is the offset stable?

## 4) Verify container + VM assumptions

Common gotchas:

- Containers inherit time from the host; if the host drifts, every container drifts.
- VM snapshots and paused VMs can cause jumpy clocks.
- Autoscaled instances may boot with incorrect time until sync completes.

Mitigation:

- Prefer stable hosts with correct time sync.
- Delay bot startup until time sync is confirmed.

## 5) Exchange-time calibration (recommended)

Implement a lightweight calibration step:

- Call the exchange `serverTime` endpoint.
- Compute offset: `offset_ms = server_time_ms - local_time_ms`.
- Apply offset to all signed requests.

Operational rules:

- Recalibrate periodically (e.g., every 5–15 minutes).
- Recalibrate immediately after:
  - a resume from sleep
  - container restart
  - VM migration

## 6) Logging requirements (non-negotiable)

Log these fields on every signed request:

- `local_ts_ms`
- `server_ts_ms` (if available)
- `applied_offset_ms`
- `recv_window_ms`
- `signing_version` (helps detect deploy-related signature changes)
- `auth_error_class` (timestamp | signature | permission)

## 7) Fast tests

1) Force drift detection
- Temporarily skew local clock by +5s in a safe environment.
- Confirm your bot trips the auth breaker.

2) Confirm offset correction
- With offset enabled, verify signed calls succeed even with minor drift.

## 8) Definition of “fixed”

You’re done when:

- Timestamp-related errors drop to ~0.
- Offset stays within a small band (no oscillation).
- Auth errors do not trigger retry storms.
