Error Index
Real failure patterns mapped to practical playbooks. Start with the symptom you are seeing, then jump to the canonical post and shipped resource.
Coverage by lane
WebSocket disconnects cause stale bot state
Connections recover but position and order state drift from reality, leading to unsafe decisions. Reliable reconnection, gap detection, and replay boundaries are needed to stabilize.
Database write succeeded but event was never published
Teams observe data committed but downstream systems never receive the event. This is a classic transactional boundary failure where outbox pattern adoption closes the gap.
Bot restart causes duplicate orders or orphan state
After crash or restart, reconciliation is incomplete and live state diverges from exchange truth. Without startup guards and idempotent reconciliation loops, duplicates are likely.
Duplicate orders and emails from retried API calls
Under retry pressure, side effects execute more than once and teams lose trust in automation. Idempotency keys and contract rules are required before scaling retries.
Trading bot 429 storm immediately after deploy
Multiple instances restart together, resync bursts hit the same budget, and retries amplify load. The path out is coordinated rate limiting, Retry-After handling, and startup ramp controls.
HttpClient 429 spikes after deploy in .NET
Deploy finishes, traffic looks normal, then 429 errors spike and retries make it worse. This usually means Retry-After is ignored, concurrency ramps too fast, or retry budgets are too loose.
Cannot trace a request across services and jobs
Teams cannot answer where a request failed because IDs are lost at service boundaries. Correlation discipline and propagation rules are usually the missing control.
Retries making outages worse in .NET services
Service owners add retries to recover, but outage duration increases and queue pressure grows. The hidden issue is usually layered retries, missing stop rules, and no total time budget.
Requests time out but CPU is normal (thread pool starvation)
Latency climbs, requests queue, and CPU looks deceptively normal. This usually points to blocked worker threads, sync-over-async hotspots, or retry/time budget behavior pinning resources.
Requests hang forever in .NET (missing or unsafe timeouts)
Incidents repeat because calls wait too long and cancellation never propagates. The fix is usually a timeout matrix and total-budget discipline across HTTP, DB, and jobs.
Background jobs stuck but dashboards look healthy
Workers look alive but business tasks silently stop progressing. This often comes from missing progress telemetry, weak liveness checks, and unclear stop/restart rules.
Exchange API key banned after bot deploy
Automation appears healthy, then keys are throttled or blocked with little warning. Typical causes are bursty startup behavior, weak limiter coordination, and policy drift across instances.
Signature invalid after bot worked for hours
Private endpoints suddenly fail with timestamp or signature errors even though no signing code changed. Clock drift and startup calibration gaps are usually the real cause.