Eventual Consistency Is Not a Bug

There is a moment in almost every distributed systems incident where someone opens a timeline and discovers that two events happened "at the same time." A payment was processed on one node before a balance check on another node had finished replicating. An inventory count was decremented on the west coast while a user on the east coast was reading the pre-decrement value.

The engineers involved will almost always describe this as a bug. It is not a bug. It is physics.

The CAP Theorem Is Not An Excuse

Eric Brewer's CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions happen in any real distributed deployment, the practical choice is between consistency and availability under partition.

This is often cited as the justification for eventual consistency: "we had to trade consistency for availability." That framing is incomplete.

The CAP theorem describes a binary (consistent or available) that is useful for theoretical analysis but maps poorly to production systems. Martin Kleppmann's PACELC extension is more practically useful: even in the absence of partitions, there is a fundamental tradeoff between latency and consistency. The real engineering question is not "consistent or available" but "how stale is stale enough, and for which operations?"

Staleness tolerance is domain-specific. A social media feed that shows a post 200ms late is fine. A bank account balance that shows a stale value at the moment of a spending decision is not fine. Treating all consistency requirements as identical is the first design mistake.

Clocks Are Lying To You

Network Time Protocol synchronizes clocks across distributed nodes, but NTP synchronization has a bounded error, typically 1ms to 100ms in well-configured environments, but higher under load, across regions, or during NTP re-synchronization events.

This matters in several concrete ways:

Log correlation: When you aggregate logs from multiple services and sort by timestamp, events that happened in the wrong order from the perspective of causality can appear in the right order from the perspective of wall-clock time, and vice versa. An event that "caused" another can appear to happen after it in your log aggregator.

Conflict resolution based on "last write wins": If two nodes both write to the same key and you resolve the conflict by keeping the write with the later timestamp, you may keep the wrong write. Timestamps are not reliable arbiters of ordering when clocks drift.

Distributed locks that depend on lease expiry: If your lock service and your application server have clocks that drift by 50ms, a lease that expires at T=1000 on the lock service may expire at T=1050 on the client. That gap is a window where both the new holder and the old holder believe they hold the lock. This is the scenario that produced the infamous Redlock controversy.

Vector clocks and hybrid logical clocks (HLCs) exist to handle this. They encode causal ordering rather than wall-clock time, providing a reliable "happened before" relationship even when physical clocks disagree. If your system needs to reason about ordering of concurrent events, physical timestamps are the wrong primitive.

Designing for Convergence

Once you accept that consistency is not binary, the design question becomes: what properties must your system converge toward, and over what timeframe?

Idempotency is non-negotiable. In a system with retries, and every system with network calls has retries, operations will execute more than once. An operation is idempotent if executing it multiple times produces the same result as executing it once.

For writes, this typically means:

Include a client-generated idempotency key with every mutation
Check on the server side whether this key has been processed before executing
Return the cached result for duplicate requests

sql.snippet

If the RETURNING clause returns nothing, the operation was a duplicate. The caller gets the original result from the application layer. The database never double-processes.

Conflict-free replicated data types (CRDTs) encode merge semantics directly into the data structure. An operation on a CRDT can be applied in any order across replicas and will produce the same result. Counters, sets, and maps have well-known CRDT implementations. For any piece of state that multiple nodes write concurrently, asking "is there a CRDT that models this?" is a productive first question.

A grow-only counter, for instance, can be implemented as a map of node ID to local count. The global count is the sum. Merging two replicas is always a max-merge per node entry: no coordination required, no conflicts possible.

Monotonic reads prevent a specific and disorienting inconsistency: a user reads a value, the read is served by node A. They immediately read again; the read is served by node B, which hasn't replicated the latest write. The second read returns a value earlier in time than the first read. Session consistency, meaning routing reads for a given user to the same replica, addresses this at the cost of stickiness overhead.

The Human Experience of Inconsistency

Most discussions of eventual consistency are purely technical. The operational and user experience dimensions deserve equal attention.

When your system is eventually consistent, users will experience it. They will submit a form, see a success message, and then refresh to find their data hasn't appeared. They will delete an item and immediately try to create another with the same name, and the creation will fail because the deletion hasn't replicated.

These experiences can be designed around:

Optimistic UI updates: Reflect the user's intent immediately in the interface without waiting for replication confirmation. If the operation fails, roll back with an error. This is the "assume success" pattern and it is standard in modern frontends.

Explicit transitional states: Rather than pretending the data is immediately consistent, represent the intermediate state explicitly. "Your payment is being processed" is a correct and honest UX for a system that hasn't yet replicated. "Balance updated" when the balance hasn't replicated is a lie that erodes trust when the user's next action depends on the stale value.

Read-your-writes consistency at the session level: Ensure that after a write, subsequent reads in the same session return at least the value you just wrote. This is usually achievable with session affinity to a single replica, a brief directed read to the primary, or a short sleep before redirecting to a read path, each with different tradeoffs.

Observability Over Eventually Consistent Systems

Eventually consistent systems fail silently. A write that doesn't replicate doesn't throw an error; it just arrives late, or not at all. A conflict that gets resolved incorrectly doesn't cause a crash; it silently persists the wrong data.

The observability requirements are higher than for strongly consistent systems:

Replication lag metrics: You should know, continuously, what the lag is between your primary and each replica. Alert when it exceeds your defined SLO. Replication lag is not a passive health indicator; it is an active signal about the staleness your users are currently experiencing.

Conflict rate metrics: If you are using last-write-wins or any other conflict resolution strategy, track how often conflicts are occurring. A rising conflict rate often precedes data correctness problems and is a leading indicator of write contention.

Tombstone tracking: In systems with distributed deletes, tombstones must be propagated before the garbage collection window closes, or the deleted data can be "resurrected" from replicas that never received the delete. Cassandra's repair and gc_grace_seconds exist for this reason.

Anti-entropy job success rates: Systems like Cassandra and DynamoDB run background repair processes to detect and correct replica divergence. Track whether these jobs are completing successfully and on schedule. A failing repair job will eventually produce visible data inconsistency.

When to Reach for Strong Consistency

Eventual consistency is not the right model for every operation. There are specific scenarios where strong consistency is worth the latency and availability tradeoff:

Financial transactions: Balance debits, payment processing, double-spend prevention. The user's financial reality must be consistent.
Inventory management at low stock: The inconsistency window between a final unit being reserved and the reservation replicating can cause overselling. Below a configurable threshold, use a strongly consistent path.
Access control changes: When you revoke a permission, the revocation must be immediately consistent. An eventually consistent permission system can allow access after revocation, which is a security hole, not just a user experience problem.
Unique constraint enforcement: Username and email uniqueness constraints enforced only at the replica level will produce duplicate accounts during replication lag windows.

For these operations, use a strongly consistent system: a single-primary database write, a consensus protocol like Raft, or a service that provides linearizable guarantees explicitly.

The architecture that scales: strong consistency for state that requires it, eventual consistency for everything that tolerates it. The design work is in correctly classifying which category each operation belongs to.

The Mental Model That Scales

Eventual consistency is not a compromise or a fallback. It is an explicit acknowledgment that the network is unreliable, clocks are imprecise, and synchronization has a latency cost. The systems that handle this well are not the ones that minimize their consistency guarantees; they are the ones that know exactly which consistency guarantees each operation requires, and apply the appropriate model to each.

That clarity is the design work. It is harder than choosing "strongly consistent everywhere," and it is harder than choosing "eventually consistent everywhere." But it produces systems that are correct where they need to be and available where they can afford to be. At scale, that distinction is the difference between a system that survives and one that does not.