Outbox pattern: reliable writes + events without the enterprise baggage

Feb 24, 202612 min read

Share|

Category:.NET

Outbox pattern: reliable writes + events without the enterprise baggage

When a database write succeeds but the event never arrives, your system is lying to downstream consumers. The outbox pattern fixes this without a distributed transaction or a message broker rewrite.

Free download: Outbox pattern checklist + schema (.NET). Jump to the download section.

Paid pack available. Jump to the Axiom pack.

The order was created. The row is in the database. The customer is waiting for confirmation. But the email never sends, the downstream system never updates, and support is now asking why the "successful" order has no record anywhere else. You check the logs and find the event publish call threw an exception three seconds after the database commit. The write succeeded; the notification failed; and now you have a ghost record that only exists in one place.

That is the dual write problem. Any time you write to a database and then publish an event as two separate operations, you have a window where one can succeed and the other can fail. Retries do not fix this. Transactions across separate systems do not exist in most stacks. The outbox pattern is the smallest bounded fix: write the event to your own database in the same transaction as the business data, then publish it separately.

This is not a tutorial on building an event sourcing framework. It is a playbook for teams who need reliable event publishing today without adopting Kafka, Debezium, or a full CDC pipeline. The pattern is old, well understood, and works in legacy and modern .NET systems.

If you only do three things
  • Write events to an outbox table in the same transaction as your business data.
  • Run a separate publisher that polls (or listens) and delivers events, marking them as sent.
  • Make consumers idempotent so duplicate deliveries are harmless.

Why events get lost after successful writes

The failure is deterministic once you understand the ordering. Your code does this:

  1. Begin transaction
  2. Insert/update business data
  3. Commit transaction
  4. Publish event to broker/queue

If step 4 fails (network timeout, broker down, process crash), the database has the data but the event never reaches downstream systems. The next request may succeed, but the lost event is never retried because the business transaction already committed. There is no automatic recovery path.

Worse: if your code does steps 3 and 4 in the opposite order (publish first, then commit), you can send an event for data that never persists. Both orderings have a failure window. The dual write problem is structural, not accidental.

The outbox pattern eliminates this window by making the event part of the database transaction. The event row commits or rolls back with the business data. A separate process reads uncommitted events and publishes them. If the publisher crashes, it resumes from the last uncommitted event. No event is lost after a successful commit.

The incident pattern this playbook targets

  • "Order created, but downstream system never saw it."
  • "Email confirmation never sent, but the record exists."
  • "Inventory updated, but the warehouse system missed the event."
  • "Customer complaints about missing notifications after successful transactions."
  • "Manual reconciliation needed because events were dropped."

If any of those sound familiar, the outbox pattern directly addresses the root cause.

Mini incident timeline

A payment service writes a successful charge to the database and attempts to publish a "PaymentCompleted" event to a message broker. The broker is under load and the publish call times out. The code swallows the exception (or retries once and gives up). The database commit is already done. The event is lost.

Downstream, the order fulfillment service never receives the signal. The customer sees "payment successful" but the order never ships. Support manually triggers fulfillment after the customer complains. Multiply this by 50 orders during peak traffic.

A single outbox row would have captured the event atomically with the payment record. The publisher would have retried until the broker accepted it. No manual intervention, no lost orders.

Fast triage table: symptom to likely cause to confirm to fix

SymptomLikely causeConfirmFix (minimum)
Event never arrived, but database row existsDual write failure (publish after commit failed)Logs show commit success, publish error or no logAdd outbox table, write event in same transaction
Duplicate events downstreamPublisher retried, consumer not idempotentConsumer logs show same event id processed twiceAdd idempotency check in consumer (event id + dedup store)
Events arrive out of orderParallel publishers or no ordering keyEvent timestamps vs arrival timestamps differSingle publisher per partition or accept eventual ordering
Outbox table grows foreverPublisher not marking events as sentQuery outbox for old unsent rowsFix publisher completion, add TTL/cleanup job
Publisher stuck, events piling upPublisher crashed or blockedOutbox row count increasing, publisher logs silentRestart publisher, add health check, add alerting
Partial event dataSerialization error or schema mismatchEvent payload in outbox is truncated or malformedValidate payload before insert, add schema versioning

Common misconceptions that cause lost events

Teams reach for the wrong fixes before understanding the problem.

The first bad fix is retrying the publish call. Retries help with transient failures, but if the process crashes after the database commit and before the retry completes, the event is still lost. Retries do not survive process death.

The second bad fix is using a distributed transaction (two phase commit). Most message brokers do not support XA transactions. Even when they do, the performance and complexity cost is high. The outbox avoids this by keeping the event in the same database as the business data.

The third bad fix is assuming "at least once" delivery from the broker means you are safe. At least once delivery means the broker will retry, but only after it receives the event. If the event never reaches the broker, there is nothing to retry.

The fourth bad fix is publishing before committing. This sends events for data that may roll back. Downstream systems act on data that does not exist. The outbox ensures events only exist for committed data.

The outbox pattern in plain terms

Definition: an outbox table is a database table where you write events in the same transaction as your business data.

Definition: an outbox publisher is a background process that reads uncommitted events from the outbox, publishes them to the broker, and marks them as sent.

Definition: idempotent consumer means the downstream system can receive the same event multiple times without incorrect side effects.

The pattern works because database transactions are reliable. If the business write commits, the event row commits. If the write rolls back, the event row rolls back. The publisher is decoupled and can retry independently.

Minimal outbox schema

This is the smallest schema that works. You can extend it later.

sql
CREATE TABLE outbox_events (
  id                 BIGINT IDENTITY(1,1) PRIMARY KEY,
  event_type         NVARCHAR(256)   NOT NULL,
  event_payload      NVARCHAR(MAX)   NOT NULL,
  created_at_utc     DATETIME2       NOT NULL DEFAULT SYSUTCDATETIME(),
  published_at_utc   DATETIME2       NULL,
  correlation_id     NVARCHAR(128)   NULL,
  retry_count        INT             NOT NULL DEFAULT 0
);
 
CREATE INDEX ix_outbox_unpublished ON outbox_events (created_at_utc)
  WHERE published_at_utc IS NULL;

Field explanations:

  • event_type: discriminator so publishers and consumers know how to deserialize
  • event_payload: JSON blob containing the full event data
  • published_at_utc: NULL means unpublished; set after successful delivery
  • correlation_id: ties the event to a request for tracing
  • retry_count: tracks delivery attempts for alerting and dead-letter decisions

Writing to the outbox in the same transaction

The key invariant: business data and event row commit or roll back together.

csharp
public async Task CreateOrderAsync(Order order, CancellationToken ct)
{
    await using var transaction = await _db.Database.BeginTransactionAsync(ct);
 
    // 1) Write business data
    _db.Orders.Add(order);
 
    // 2) Write event to outbox (same transaction)
    var outboxEvent = new OutboxEvent
    {
        EventType = "OrderCreated",
        EventPayload = JsonSerializer.Serialize(new OrderCreatedEvent
        {
            OrderId = order.Id,
            CustomerId = order.CustomerId,
            TotalAmount = order.TotalAmount,
            CreatedAtUtc = DateTime.UtcNow
        }),
        CorrelationId = Activity.Current?.Id
    };
    _db.OutboxEvents.Add(outboxEvent);
 
    // 3) Commit both or roll back both
    await _db.SaveChangesAsync(ct);
    await transaction.CommitAsync(ct);
}

If SaveChangesAsync or CommitAsync fails, both the order and the event are rolled back. There is no window where one exists without the other.

Outbox publisher: polling approach

The simplest publisher is a background service that polls the outbox table.

csharp
public class OutboxPublisher : BackgroundService
{
    private readonly IServiceScopeFactory _scopeFactory;
    private readonly IMessageBroker _broker;
    private readonly ILogger<OutboxPublisher> _logger;
 
    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            try
            {
                using var scope = _scopeFactory.CreateScope();
                var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
 
                var batch = await db.OutboxEvents
                    .Where(e => e.PublishedAtUtc == null)
                    .OrderBy(e => e.Id)
                    .Take(100)
                    .ToListAsync(ct);
 
                foreach (var evt in batch)
                {
                    try
                    {
                        await _broker.PublishAsync(evt.EventType, evt.EventPayload, ct);
                        evt.PublishedAtUtc = DateTime.UtcNow;
                        await db.SaveChangesAsync(ct);
                    }
                    catch (Exception ex)
                    {
                        evt.RetryCount++;
                        await db.SaveChangesAsync(ct);
                        _logger.LogWarning(ex,
                            "Failed to publish outbox event {EventId}, retry {RetryCount}",
                            evt.Id, evt.RetryCount);
                    }
                }
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Outbox publisher loop error");
            }
 
            await Task.Delay(TimeSpan.FromSeconds(5), ct);
        }
    }
}

Polling is not elegant, but it is reliable and easy to reason about. For higher throughput, consider CDC (Change Data Capture) or database triggers, but polling covers most workloads.

Making consumers idempotent

The outbox guarantees events are not lost, but it does not guarantee exactly once delivery. Network issues or publisher retries can cause duplicates. Consumers must handle this.

The simplest approach: store the event id and skip if already processed.

csharp
public async Task HandleOrderCreatedAsync(OrderCreatedEvent evt, CancellationToken ct)
{
    // Check if already processed
    var existing = await _db.ProcessedEvents
        .FirstOrDefaultAsync(e => e.EventId == evt.EventId, ct);
 
    if (existing is not null)
    {
        _logger.LogInformation("Skipping duplicate event {EventId}", evt.EventId);
        return;
    }
 
    // Process the event
    await _fulfillmentService.StartFulfillmentAsync(evt.OrderId, ct);
 
    // Mark as processed
    _db.ProcessedEvents.Add(new ProcessedEvent { EventId = evt.EventId });
    await _db.SaveChangesAsync(ct);
}

This pairs naturally with idempotency keys at the API boundary. The outbox handles write-to-event reliability; idempotency handles retry-to-duplicate safety.

Tradeoffs and when the outbox is not enough

The outbox adds a table, a publisher, and operational overhead. For systems with low event volume, this is negligible. For high throughput systems, you may need to tune polling intervals, batch sizes, or switch to CDC.

The outbox does not provide ordering guarantees beyond what your publisher enforces. If you need strict ordering, use a single publisher per partition or accept eventual consistency.

The outbox does not replace a proper event store if you need event sourcing semantics (replay, projections, temporal queries). It is a reliability pattern, not an architecture.

If your events cross multiple databases, you may need a saga or choreography pattern. The outbox handles single database reliability; cross-database coordination is a larger problem.

What to log so lost events are provable

If you do not log the outbox lifecycle, you cannot prove events were delivered or diagnose failures.

Required fields:

  • outbox_event_id
  • event_type
  • correlation_id
  • outbox_action (created, published, failed, skipped_duplicate)
  • retry_count
  • duration_ms
  • error_message (on failure)

Log on every outbox write and every publish attempt. This turns incident diagnosis from guesswork into evidence.

Fix plan: roll out the outbox without breaking production

Phase 1: Add the table and dual write

  • Create the outbox table with the minimal schema.
  • Update write paths to insert an outbox row in the same transaction.
  • Do not deploy the publisher yet; let events accumulate to validate the write path.

Phase 2: Deploy the publisher (shadow mode)

  • Deploy the outbox publisher.
  • Publish events but do not remove the existing direct publish calls.
  • Compare: are outbox events arriving? Are they duplicates of direct publishes?
  • Validate with logs and metrics.

Phase 3: Remove direct publish, rely on outbox

  • Remove direct publish calls from the write path.
  • The outbox is now the only source of events.
  • Monitor for delays, backlogs, and missed events.

Phase 4: Add cleanup and alerting

  • Add a job to delete old published events (TTL based).
  • Add alerts for unpublished event counts and publisher health.

Shipped asset

Download
Free

Outbox pattern checklist + schema for .NET

Minimal table schema, publisher template, and rollout checklist (free, email delivery)

When to use this (fit check)
  • You write to a database and publish events as separate operations.
  • You have seen (or cannot rule out) lost events after successful database commits.
  • You need to prove events cannot be lost without adopting a full CDC pipeline.
When NOT to use this (yet)
  • All your events are fire and forget with no downstream dependencies.
  • You already have a CDC pipeline (Debezium, etc.) handling event capture.
  • You need strict ordering guarantees the polling publisher cannot provide (consider CDC or Kafka).

What you get (4 files):

  • outbox-table-schema.sql: Minimal SQL schema for SQL Server (adapt for Postgres/MySQL)
  • outbox-rollout-checklist.md: Phase by phase rollout plan with validation steps
  • outbox-publisher-template.cs: Starter BackgroundService for polling
  • README.md: Setup instructions and integration guidance
Axiom Pack
$79

Idempotency Implementation Playbook

Dealing with lost events and duplicate writes across multiple services? Get schemas, outbox patterns, and verification steps that prove events cannot be lost or duplicated.

  • Extended outbox schema with dead letter handling
  • Consumer idempotency patterns for multi-step workflows
  • Verification harness to prove events survive crashes and retries
Get Idempotency Playbook →

Resources

Internal:

External:

The database commit and the event publish are two separate operations. If the publish fails after the commit, the event is lost. The outbox pattern writes the event to the database in the same transaction, so both succeed or fail together.

No. Event sourcing stores all state changes as events and reconstructs state from them. The outbox pattern is simpler: it stores events temporarily until they are published, then deletes them. It is a reliability pattern, not an architectural style.

Make consumers idempotent. Store the event id when you process it and skip if you see the same id again. This pairs with idempotency keys at the API boundary.

When the publisher restarts, it reads unpublished events from the outbox and resumes. Events are not lost because they are persisted in the database. Add health checks and alerting so you know when the publisher is down.

Polling is simpler and works for most workloads. CDC (Debezium, etc.) is better for high throughput or when you need lower latency. Start with polling; switch to CDC if you hit limits.

Keep unpublished events until they are published. For published events, a TTL of 24 to 72 hours is common for debugging. Add a cleanup job to prevent unbounded growth.

Yes. Use BeginTransactionAsync, add both your entity and the outbox event, then call SaveChangesAsync and CommitAsync. Both succeed or fail together.

Coming soon

If you are dealing with lost events across multiple services or need advanced patterns like dead letter handling and saga coordination, the idempotency playbook includes extended schemas and verification harnesses.

Coming soon

Axiom .NET Rescue (Coming Soon)

Get notified when we ship outbox templates, idempotency patterns, and production runbooks for .NET services.

Checklist (copy/paste)

  • Outbox table exists with: id, event_type, event_payload, created_at_utc, published_at_utc, correlation_id, retry_count.
  • Business data and outbox event are written in the same database transaction.
  • Outbox publisher runs as a background service (polling or CDC).
  • Publisher marks events as published (sets published_at_utc) after successful delivery.
  • Publisher logs: outbox_event_id, event_type, correlation_id, outbox_action, retry_count.
  • Consumers are idempotent (skip duplicate event ids).
  • Alerting exists for unpublished event backlog and publisher health.
  • Cleanup job removes old published events (TTL: 24-72 hours).
  • Rollout plan: dual write first, shadow publish, then cutover.
  • You can prove in a test:
    • Killing the publisher does not lose events
    • Restarting the publisher delivers pending events
    • Duplicate delivery does not cause duplicate side effects

Key takeaways

  • The dual write problem causes lost events after successful database commits.
  • The outbox pattern writes events atomically with business data.
  • A separate publisher reads and delivers events, retrying on failure.
  • Consumers must be idempotent to handle duplicate deliveries.
  • Start with polling for simplicity; consider CDC for high throughput.
  • Log the outbox lifecycle so incidents are diagnosable.

Recommended resources

Download the shipped checklist/templates for this post.

A minimal schema, polling publisher template, and rollout checklist for reliable event publishing in .NET.

resource

Related posts