~ / track F / concurrency models

CRDTs

Advanced

A CRDT (Conflict-free Replicated Data Type) is a data structure designed to be updated independently on many nodes and merged deterministically, without coordination. Every node converges to the same value as long as it eventually sees the same set of updates. CRDTs are what let collaborative apps (Google Docs, Figma, distributed databases) keep going during a network partition and reconcile cleanly afterward.

The convergence property

The defining law: for any two replica states a and b, merge(a, b) gives the same result on every node, regardless of the order in which the nodes see updates. Merge must be:

  • Commutativemerge(a, b) = merge(b, a).
  • Associativemerge(a, merge(b, c)) = merge(merge(a, b), c).
  • Idempotentmerge(a, a) = a.

These are exactly the laws of a semilattice (see the algebraic hierarchy topic). CRDTs are semilattices for distributed state.

Example 1: G-Counter (grow-only counter)

A counter that only goes up. Each node has its own slot; the total is the sum across slots; merging takes the per-node max.

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

Each node is the sole writer of its own slot, so max is the right merge — the higher count is always the truer one. The total is the sum.

Example 2: G-Set (grow-only set)

Add-only set. Merge is union. Order-independent, duplicate-safe, idempotent.

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

The G-set is the simplest CRDT — set union is a textbook semilattice operation.

Example 3: LWW-Register (last-write-wins)

A single value with a per-write timestamp. Merge takes whichever side has the larger timestamp. Simple, but requires (roughly) synchronized clocks or a logical clock.

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

LWW-registers are widely used in distributed key-value stores because of their simplicity, but "last write" is sometimes the wrong policy — you may prefer a multi-value register (MVR) that retains both branches as a conflict.

State-based vs operation-based

Two flavors exist:

  • State-based (CvRDT): nodes exchange the full state; merge is the semilattice join (the examples above).
  • Operation-based (CmRDT): nodes broadcast operations; if delivery is exactly-once and causally ordered, merging by replaying operations converges.

State-based is simpler to reason about; op-based often uses less bandwidth.

Where CRDTs are used

  • Collaborative editing. Yjs, Automerge — text and JSON CRDTs.
  • Distributed databases. Riak, AntidoteDB, Redis (CRDTs flavor).
  • Offline-first apps. Sync once the network is back; merges Just Work.

The big idea is the same: trade strict ordering for availability and partition tolerance, and rely on the mathematics of the merge to make the result coherent.

Check yourself

? quiz

Why is *idempotent merge* essential for a CRDT?

Exercise

Implement a two-phase set (2P-Set) CRDT: an add-set and a remove-set. An element is "present" if it's in the add-set and not in the remove-set. Merge is element-wise union of both sets.

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate
 status: new