~ / track H / applied patterns

Make illegal states unrepresentable

Intermediate

A slogan from Yaron Minsky (Jane Street, OCaml). The principle: design your data so that invalid combinations cannot be expressed in the type system at all. A bug that the compiler refuses to compile is a bug that never reaches production.

The classic example: nullable + optional

// Bad: two booleans, three of four combinations are valid
type User = {
  email: string;
  emailVerified: boolean;
  resetToken: string | null;     // null until the user requests a reset
  resetExpires: Date  | null;
};

What does resetToken = "abc" with resetExpires = null mean? Probably a bug. What about both null? Probably "no reset in progress." Both non-null? Probably valid. The type permits four states, three of which are nonsense.

// Good: a sum type that mirrors the actual states
type ResetState =
  | { kind: "none" }
  | { kind: "pending"; token: string; expires: Date };

type User = {
  email: string;
  emailVerified: boolean;
  reset: ResetState;
};

Now pending requires both token and expires. There is no {kind: "pending", token: undefined} shape — the compiler will refuse it.

The general technique

SmellFix
Two booleans, only 3 of 4 combos validSingle 3-variant sum type
Maybe X + Maybe Y that always come togetherMaybe (X, Y)
String with "magic" values like "pending" / "done"Enum / sum type
Optional field plus a flag saying "look at the optional"Lift the flag into a sum
Nested Maybe (Maybe T)Collapse to Maybe T or a 3-variant sum
Many similar optional fields, e.g. address1?, address2?, address3?List, or a sum of "no address," "one," "many"

The pattern: wherever the validity of one field depends on another, make a sum type.

Clojure: spec / malli to the rescue

Clojure's untyped, but clojure.spec and malli let you describe the same constraints as runtime predicates that ALSO drive generative tests:

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

You don't get compile-time rejection, but you get runtime rejection at the boundary plus property-test generation that explores the state space for you. With :pre / :post conditions on functions, you can keep illegal states from propagating.

Examples in real systems

  • Stripe API: subscriptions have a state machine — incomplete, active, past_due, canceled, unpaid. Fields like current_period_end only matter in certain states. A well-typed client models this as a sum.
  • Network sockets: separate types for Listening, Connected, Closed. You can't call read on a Listening socket — it's a type error.
  • Form validation: rather than a User with everything optional, separate types DraftUser, ValidatedUser, PersistedUser. The type-system enforces "you can't email a draft."
  • HTTP responses: response types parameterized by status — only 200 responses carry a body, only 3xx responses carry a Location header.

The dependent-types extreme

In Idris/Agda/Lean you can encode invariants directly in the type:

data NonEmpty : List a -> Type where
  IsNonEmpty : NonEmpty (x :: xs)

head : (xs : List a) -> NonEmpty xs -> a
head (x :: _) _ = x
-- Won't compile: there's no NonEmpty value to satisfy [] case

head [] becomes a type error, not a runtime error. Same idea, taken to its logical conclusion.

What you give up

  • Some flexibility at the boundary. External APIs return things you can't control. You parse-don't-validate (next track topic) — convert at the boundary into your tighter types and reject what you can't represent.
  • Refactor pain. If you discover a new state, you add a variant — and every match on that sum type now needs to handle it. The compiler tells you exactly where. (This is a feature, not a bug.)
  • Sometimes more noise for simple cases. Don't sum-type a boolean.

Connection to other ideas

  • Curry-Howard: a sum type is a disjunction; making illegal states unrepresentable is asserting a disjunction the type theorist would write down.
  • Parse, don't validate: the parser's output type is the post-validation shape — illegal inputs can't escape parsing.
  • Domain-driven design: aggregates and value objects often should be sums, even if mainstream OO docs use classes.

Check yourself

? quiz

Why is `type User = { resetToken: string | null; resetExpires: Date | null }` worse than a 2-variant sum type?

Exercise

You have a Reservation with fields confirmedAt: Date | null, cancelledAt: Date | null, and paid: boolean. Enumerate which combinations of these three fields are real states and which are nonsense. Redesign the type as a sum so that only real states are representable.

 status: new