
Category:.NET
Outbox pattern: reliable writes + events without the enterprise baggage
When a database write succeeds but the event never arrives, your system is lying to downstream consumers. The outbox pattern fixes this without a distributed transaction or a message broker rewrite.
Free download: Outbox pattern checklist + schema (.NET). Jump to the download section.
Paid pack available. Jump to the Axiom pack.
The order was created. The row is in the database. The customer is waiting for confirmation. But the email never sends, the downstream system never updates, and support is now asking why the "successful" order has no record anywhere else. You check the logs and find the event publish call threw an exception three seconds after the database commit. The write succeeded; the notification failed; and now you have a ghost record that only exists in one place.
That is the dual write problem. Any time you write to a database and then publish an event as two separate operations, you have a window where one can succeed and the other can fail. Retries do not fix this. Transactions across separate systems do not exist in most stacks. The outbox pattern is the smallest bounded fix: write the event to your own database in the same transaction as the business data, then publish it separately.
This is not a tutorial on building an event sourcing framework. It is a playbook for teams who need reliable event publishing today without adopting Kafka, Debezium, or a full CDC pipeline. The pattern is old, well understood, and works in legacy and modern .NET systems.
- Write events to an outbox table in the same transaction as your business data.
- Run a separate publisher that polls (or listens) and delivers events, marking them as sent.
- Make consumers idempotent so duplicate deliveries are harmless.
Why events get lost after successful writes
The failure is deterministic once you understand the ordering. Your code does this:
- Begin transaction
- Insert/update business data
- Commit transaction
- Publish event to broker/queue
If step 4 fails (network timeout, broker down, process crash), the database has the data but the event never reaches downstream systems. The next request may succeed, but the lost event is never retried because the business transaction already committed. There is no automatic recovery path.
Worse: if your code does steps 3 and 4 in the opposite order (publish first, then commit), you can send an event for data that never persists. Both orderings have a failure window. The dual write problem is structural, not accidental.
The outbox pattern eliminates this window by making the event part of the database transaction. The event row commits or rolls back with the business data. A separate process reads uncommitted events and publishes them. If the publisher crashes, it resumes from the last uncommitted event. No event is lost after a successful commit.
The incident pattern this playbook targets
- "Order created, but downstream system never saw it."
- "Email confirmation never sent, but the record exists."
- "Inventory updated, but the warehouse system missed the event."
- "Customer complaints about missing notifications after successful transactions."
- "Manual reconciliation needed because events were dropped."
If any of those sound familiar, the outbox pattern directly addresses the root cause.
Mini incident timeline
A payment service writes a successful charge to the database and attempts to publish a "PaymentCompleted" event to a message broker. The broker is under load and the publish call times out. The code swallows the exception (or retries once and gives up). The database commit is already done. The event is lost.
Downstream, the order fulfillment service never receives the signal. The customer sees "payment successful" but the order never ships. Support manually triggers fulfillment after the customer complains. Multiply this by 50 orders during peak traffic.
A single outbox row would have captured the event atomically with the payment record. The publisher would have retried until the broker accepted it. No manual intervention, no lost orders.
Fast triage table: symptom to likely cause to confirm to fix
| Symptom | Likely cause | Confirm | Fix (minimum) |
|---|---|---|---|
| Event never arrived, but database row exists | Dual write failure (publish after commit failed) | Logs show commit success, publish error or no log | Add outbox table, write event in same transaction |
| Duplicate events downstream | Publisher retried, consumer not idempotent | Consumer logs show same event id processed twice | Add idempotency check in consumer (event id + dedup store) |
| Events arrive out of order | Parallel publishers or no ordering key | Event timestamps vs arrival timestamps differ | Single publisher per partition or accept eventual ordering |
| Outbox table grows forever | Publisher not marking events as sent | Query outbox for old unsent rows | Fix publisher completion, add TTL/cleanup job |
| Publisher stuck, events piling up | Publisher crashed or blocked | Outbox row count increasing, publisher logs silent | Restart publisher, add health check, add alerting |
| Partial event data | Serialization error or schema mismatch | Event payload in outbox is truncated or malformed | Validate payload before insert, add schema versioning |
Common misconceptions that cause lost events
Teams reach for the wrong fixes before understanding the problem.
The first bad fix is retrying the publish call. Retries help with transient failures, but if the process crashes after the database commit and before the retry completes, the event is still lost. Retries do not survive process death.
The second bad fix is using a distributed transaction (two phase commit). Most message brokers do not support XA transactions. Even when they do, the performance and complexity cost is high. The outbox avoids this by keeping the event in the same database as the business data.
The third bad fix is assuming "at least once" delivery from the broker means you are safe. At least once delivery means the broker will retry, but only after it receives the event. If the event never reaches the broker, there is nothing to retry.
The fourth bad fix is publishing before committing. This sends events for data that may roll back. Downstream systems act on data that does not exist. The outbox ensures events only exist for committed data.
The outbox pattern in plain terms
Definition: an outbox table is a database table where you write events in the same transaction as your business data.
Definition: an outbox publisher is a background process that reads uncommitted events from the outbox, publishes them to the broker, and marks them as sent.
Definition: idempotent consumer means the downstream system can receive the same event multiple times without incorrect side effects.
The pattern works because database transactions are reliable. If the business write commits, the event row commits. If the write rolls back, the event row rolls back. The publisher is decoupled and can retry independently.
Minimal outbox schema
This is the smallest schema that works. You can extend it later.
CREATE TABLE outbox_events (
id BIGINT IDENTITY(1,1) PRIMARY KEY,
event_type NVARCHAR(256) NOT NULL,
event_payload NVARCHAR(MAX) NOT NULL,
created_at_utc DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
published_at_utc DATETIME2 NULL,
correlation_id NVARCHAR(128) NULL,
retry_count INT NOT NULL DEFAULT 0
);
CREATE INDEX ix_outbox_unpublished ON outbox_events (created_at_utc)
WHERE published_at_utc IS NULL;Field explanations:
event_type: discriminator so publishers and consumers know how to deserializeevent_payload: JSON blob containing the full event datapublished_at_utc: NULL means unpublished; set after successful deliverycorrelation_id: ties the event to a request for tracingretry_count: tracks delivery attempts for alerting and dead-letter decisions
Writing to the outbox in the same transaction
The key invariant: business data and event row commit or roll back together.
public async Task CreateOrderAsync(Order order, CancellationToken ct)
{
await using var transaction = await _db.Database.BeginTransactionAsync(ct);
// 1) Write business data
_db.Orders.Add(order);
// 2) Write event to outbox (same transaction)
var outboxEvent = new OutboxEvent
{
EventType = "OrderCreated",
EventPayload = JsonSerializer.Serialize(new OrderCreatedEvent
{
OrderId = order.Id,
CustomerId = order.CustomerId,
TotalAmount = order.TotalAmount,
CreatedAtUtc = DateTime.UtcNow
}),
CorrelationId = Activity.Current?.Id
};
_db.OutboxEvents.Add(outboxEvent);
// 3) Commit both or roll back both
await _db.SaveChangesAsync(ct);
await transaction.CommitAsync(ct);
}If SaveChangesAsync or CommitAsync fails, both the order and the event are rolled back. There is no window where one exists without the other.
Outbox publisher: polling approach
The simplest publisher is a background service that polls the outbox table.
public class OutboxPublisher : BackgroundService
{
private readonly IServiceScopeFactory _scopeFactory;
private readonly IMessageBroker _broker;
private readonly ILogger<OutboxPublisher> _logger;
protected override async Task ExecuteAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
try
{
using var scope = _scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
var batch = await db.OutboxEvents
.Where(e => e.PublishedAtUtc == null)
.OrderBy(e => e.Id)
.Take(100)
.ToListAsync(ct);
foreach (var evt in batch)
{
try
{
await _broker.PublishAsync(evt.EventType, evt.EventPayload, ct);
evt.PublishedAtUtc = DateTime.UtcNow;
await db.SaveChangesAsync(ct);
}
catch (Exception ex)
{
evt.RetryCount++;
await db.SaveChangesAsync(ct);
_logger.LogWarning(ex,
"Failed to publish outbox event {EventId}, retry {RetryCount}",
evt.Id, evt.RetryCount);
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, "Outbox publisher loop error");
}
await Task.Delay(TimeSpan.FromSeconds(5), ct);
}
}
}Polling is not elegant, but it is reliable and easy to reason about. For higher throughput, consider CDC (Change Data Capture) or database triggers, but polling covers most workloads.
Making consumers idempotent
The outbox guarantees events are not lost, but it does not guarantee exactly once delivery. Network issues or publisher retries can cause duplicates. Consumers must handle this.
The simplest approach: store the event id and skip if already processed.
public async Task HandleOrderCreatedAsync(OrderCreatedEvent evt, CancellationToken ct)
{
// Check if already processed
var existing = await _db.ProcessedEvents
.FirstOrDefaultAsync(e => e.EventId == evt.EventId, ct);
if (existing is not null)
{
_logger.LogInformation("Skipping duplicate event {EventId}", evt.EventId);
return;
}
// Process the event
await _fulfillmentService.StartFulfillmentAsync(evt.OrderId, ct);
// Mark as processed
_db.ProcessedEvents.Add(new ProcessedEvent { EventId = evt.EventId });
await _db.SaveChangesAsync(ct);
}This pairs naturally with idempotency keys at the API boundary. The outbox handles write-to-event reliability; idempotency handles retry-to-duplicate safety.
Tradeoffs and when the outbox is not enough
The outbox adds a table, a publisher, and operational overhead. For systems with low event volume, this is negligible. For high throughput systems, you may need to tune polling intervals, batch sizes, or switch to CDC.
The outbox does not provide ordering guarantees beyond what your publisher enforces. If you need strict ordering, use a single publisher per partition or accept eventual consistency.
The outbox does not replace a proper event store if you need event sourcing semantics (replay, projections, temporal queries). It is a reliability pattern, not an architecture.
If your events cross multiple databases, you may need a saga or choreography pattern. The outbox handles single database reliability; cross-database coordination is a larger problem.
What to log so lost events are provable
If you do not log the outbox lifecycle, you cannot prove events were delivered or diagnose failures.
Required fields:
outbox_event_idevent_typecorrelation_idoutbox_action(created, published, failed, skipped_duplicate)retry_countduration_mserror_message(on failure)
Log on every outbox write and every publish attempt. This turns incident diagnosis from guesswork into evidence.
Fix plan: roll out the outbox without breaking production
Phase 1: Add the table and dual write
- Create the outbox table with the minimal schema.
- Update write paths to insert an outbox row in the same transaction.
- Do not deploy the publisher yet; let events accumulate to validate the write path.
Phase 2: Deploy the publisher (shadow mode)
- Deploy the outbox publisher.
- Publish events but do not remove the existing direct publish calls.
- Compare: are outbox events arriving? Are they duplicates of direct publishes?
- Validate with logs and metrics.
Phase 3: Remove direct publish, rely on outbox
- Remove direct publish calls from the write path.
- The outbox is now the only source of events.
- Monitor for delays, backlogs, and missed events.
Phase 4: Add cleanup and alerting
- Add a job to delete old published events (TTL based).
- Add alerts for unpublished event counts and publisher health.
Shipped asset
Outbox pattern checklist + schema for .NET
Minimal table schema, publisher template, and rollout checklist (free, email delivery)
- You write to a database and publish events as separate operations.
- You have seen (or cannot rule out) lost events after successful database commits.
- You need to prove events cannot be lost without adopting a full CDC pipeline.
- All your events are fire and forget with no downstream dependencies.
- You already have a CDC pipeline (Debezium, etc.) handling event capture.
- You need strict ordering guarantees the polling publisher cannot provide (consider CDC or Kafka).
What you get (4 files):
outbox-table-schema.sql: Minimal SQL schema for SQL Server (adapt for Postgres/MySQL)outbox-rollout-checklist.md: Phase by phase rollout plan with validation stepsoutbox-publisher-template.cs: Starter BackgroundService for pollingREADME.md: Setup instructions and integration guidance
Idempotency Implementation Playbook
Dealing with lost events and duplicate writes across multiple services? Get schemas, outbox patterns, and verification steps that prove events cannot be lost or duplicated.
- ✓Extended outbox schema with dead letter handling
- ✓Consumer idempotency patterns for multi-step workflows
- ✓Verification harness to prove events survive crashes and retries
Resources
Internal:
- The .NET Production Rescue hub
- The .NET category
- Idempotency keys for APIs
- Background jobs that hang forever
- Structured logging that actually helps
External:
- Microsoft outbox pattern guidance
- Chris Richardson on the transactional outbox
- Debezium outbox event router
- EF Core transactions
Troubleshooting Questions Engineers Search
The database commit and the event publish are two separate operations. If the publish fails after the commit, the event is lost. The outbox pattern writes the event to the database in the same transaction, so both succeed or fail together.
No. Event sourcing stores all state changes as events and reconstructs state from them. The outbox pattern is simpler: it stores events temporarily until they are published, then deletes them. It is a reliability pattern, not an architectural style.
Make consumers idempotent. Store the event id when you process it and skip if you see the same id again. This pairs with idempotency keys at the API boundary.
When the publisher restarts, it reads unpublished events from the outbox and resumes. Events are not lost because they are persisted in the database. Add health checks and alerting so you know when the publisher is down.
Polling is simpler and works for most workloads. CDC (Debezium, etc.) is better for high throughput or when you need lower latency. Start with polling; switch to CDC if you hit limits.
Keep unpublished events until they are published. For published events, a TTL of 24 to 72 hours is common for debugging. Add a cleanup job to prevent unbounded growth.
Yes. Use BeginTransactionAsync, add both your entity and the outbox event, then call SaveChangesAsync and CommitAsync. Both succeed or fail together.
Coming soon
If you are dealing with lost events across multiple services or need advanced patterns like dead letter handling and saga coordination, the idempotency playbook includes extended schemas and verification harnesses.
Axiom .NET Rescue (Coming Soon)
Get notified when we ship outbox templates, idempotency patterns, and production runbooks for .NET services.
Checklist (copy/paste)
- Outbox table exists with: id, event_type, event_payload, created_at_utc, published_at_utc, correlation_id, retry_count.
- Business data and outbox event are written in the same database transaction.
- Outbox publisher runs as a background service (polling or CDC).
- Publisher marks events as published (sets
published_at_utc) after successful delivery. - Publisher logs: outbox_event_id, event_type, correlation_id, outbox_action, retry_count.
- Consumers are idempotent (skip duplicate event ids).
- Alerting exists for unpublished event backlog and publisher health.
- Cleanup job removes old published events (TTL: 24-72 hours).
- Rollout plan: dual write first, shadow publish, then cutover.
- You can prove in a test:
- Killing the publisher does not lose events
- Restarting the publisher delivers pending events
- Duplicate delivery does not cause duplicate side effects
Key takeaways
- The dual write problem causes lost events after successful database commits.
- The outbox pattern writes events atomically with business data.
- A separate publisher reads and delivers events, retrying on failure.
- Consumers must be idempotent to handle duplicate deliveries.
- Start with polling for simplicity; consider CDC for high throughput.
- Log the outbox lifecycle so incidents are diagnosable.
Recommended resources
Download the shipped checklist/templates for this post.
A minimal schema, polling publisher template, and rollout checklist for reliable event publishing in .NET.
resource
Related posts

Idempotency keys for APIs: stop duplicate orders, emails, and writes
When retries create duplicate side effects, idempotency keys are the only safe fix. This playbook shows how to design keys, store results, and prove duplicates cannot recur.

Structured logging that actually helps: Serilog fields that matter in .NET incidents
When logs are noisy but useless: why incidents stay unsolved, which fields actually explain failures, and the minimal schema that makes .NET outages diagnosable.

OpenTelemetry for .NET: minimum viable tracing for production debugging
When incidents span multiple services and logs cannot explain latency: the smallest OpenTelemetry setup that makes production debugging possible without a full rewrite.