---
title: Retry telemetry fields
summary: The minimum fields to log so retries are debuggable.
---

# Retry telemetry fields

Retries without telemetry look like “random flakiness”. Retries with telemetry let you answer: what failed, how often, and what did our system do about it?

## Minimum fields per attempt

Log these on every attempt (including the first attempt):

- `operation`: stable name (e.g., `payments.charge`, `email.send`, `exchange.order.place`)
- `target`: host/service identifier (e.g., `api.vendor.com`)
- `request_id`: end-to-end trace id
- `attempt`: 1-based attempt number
- `max_attempts`: configured cap
- `timeout_ms`: timeout for this attempt
- `duration_ms`: actual duration
- `result`: `success` | `retry` | `fail` | `stop`

## Error classification

You need a stable classifier so you can make policy decisions:

- `error_kind`: `timeout` | `rate_limit` | `network` | `auth` | `validation` | `server_error` | `unknown`
- `http_status`: if HTTP
- `provider_error_code`: if present
- `retry_after_ms`: parsed from headers if present

## Backoff decision

- `backoff_ms`: planned delay before next attempt
- `jitter_ms`: jitter component (or include it in `backoff_ms` and log `jitter_strategy`)
- `jitter_strategy`: `full` | `equal` | `decorrelated`

## Guardrails

- `retry_budget_remaining`: if you enforce a global budget
- `circuit_state`: `closed` | `open` | `half_open`
- `concurrency_inflight`: current inflight count

## A simple log line example

Example shape (pseudo-JSON):

{
  "operation": "exchange.order.place",
  "target": "api.exchange.com",
  "request_id": "...",
  "attempt": 2,
  "max_attempts": 3,
  "timeout_ms": 10000,
  "duration_ms": 812,
  "error_kind": "rate_limit",
  "http_status": 429,
  "retry_after_ms": 1500,
  "backoff_ms": 2400,
  "jitter_strategy": "full",
  "result": "retry"
}

You don’t need this exact schema. You do need consistent fields.
