~ / track F / concurrency models
CRDTs
AdvancedA CRDT (Conflict-free Replicated Data Type) is a data structure designed to be updated independently on many nodes and merged deterministically, without coordination. Every node converges to the same value as long as it eventually sees the same set of updates. CRDTs are what let collaborative apps (Google Docs, Figma, distributed databases) keep going during a network partition and reconcile cleanly afterward.
The convergence property
The defining law: for any two replica states a and b, merge(a, b)
gives the same result on every node, regardless of the order in which the
nodes see updates. Merge must be:
- Commutative —
merge(a, b) = merge(b, a). - Associative —
merge(a, merge(b, c)) = merge(merge(a, b), c). - Idempotent —
merge(a, a) = a.
These are exactly the laws of a semilattice (see the algebraic hierarchy topic). CRDTs are semilattices for distributed state.
Example 1: G-Counter (grow-only counter)
A counter that only goes up. Each node has its own slot; the total is the sum across slots; merging takes the per-node max.
Each node is the sole writer of its own slot, so max is the right merge —
the higher count is always the truer one. The total is the sum.
Example 2: G-Set (grow-only set)
Add-only set. Merge is union. Order-independent, duplicate-safe, idempotent.
The G-set is the simplest CRDT — set union is a textbook semilattice operation.
Example 3: LWW-Register (last-write-wins)
A single value with a per-write timestamp. Merge takes whichever side has the larger timestamp. Simple, but requires (roughly) synchronized clocks or a logical clock.
LWW-registers are widely used in distributed key-value stores because of their simplicity, but "last write" is sometimes the wrong policy — you may prefer a multi-value register (MVR) that retains both branches as a conflict.
State-based vs operation-based
Two flavors exist:
- State-based (CvRDT): nodes exchange the full state; merge is the semilattice join (the examples above).
- Operation-based (CmRDT): nodes broadcast operations; if delivery is exactly-once and causally ordered, merging by replaying operations converges.
State-based is simpler to reason about; op-based often uses less bandwidth.
Where CRDTs are used
- Collaborative editing. Yjs, Automerge — text and JSON CRDTs.
- Distributed databases. Riak, AntidoteDB, Redis (CRDTs flavor).
- Offline-first apps. Sync once the network is back; merges Just Work.
The big idea is the same: trade strict ordering for availability and partition tolerance, and rely on the mathematics of the merge to make the result coherent.
Check yourself
? quiz
Why is *idempotent merge* essential for a CRDT?
Exercise
Implement a two-phase set (2P-Set) CRDT: an add-set and a remove-set. An element is "present" if it's in the add-set and not in the remove-set. Merge is element-wise union of both sets.