~ / track G / neighboring paradigms

Parser combinators

Intermediate

A parser combinator is just a function that consumes some input and returns either a parsed value plus the leftover input, or a failure. The magic is that you can combine small parsers into bigger ones — sequence, choice, repetition — and get a full grammar without a parser generator and without code generation. It's the functional answer to YACC.

The shape

A parser is a function String → {:ok value rest} | {:err msg}. Tiny example:

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

lit is a parser factory: give it a string, get a parser that recognizes that string at the start of the input. The whole library is small functions like this.

Sequence and choice

Two combinators dominate every parser library:

  • pseq — run parsers in order, both must succeed.
  • palt — try one; if it fails, try the next.
loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate

palt reads as "yes or no." Add pseq and you can describe arbitrary grammars.

Why combinators are powerful

  • Composition is the API. You don't learn a grammar DSL; you stick small functions together with normal function composition.
  • First-class grammars. Build a parser at runtime, return one from a function, store one in a data structure. A grammar generator is just a function.
  • Error messages come from where they happened. Failures carry the position, and you can wrap parsers with with-label to produce domain-specific errors.
  • Same shape works for streams. Switch the "input" abstraction from String to a byte stream or token stream — the combinators don't care.

A taste of a real grammar

A toy expression grammar: digits | (expr + expr). With lit, palt, and a pseq you didn't see, you'd write:

(def expr
  (palt digits
        (pseq (lit "(") expr (lit "+") expr (lit ")")
              (fn [_ a _ b _] {:sum [a b]}))))

That's the entire grammar. No external file, no make parse. The grammar is just code that runs.

In the wild

  • Clojure: instaparse (uses combinator-style declarative grammars) and hand-rolled combinator libraries.
  • Haskell: parsec, megaparsec, attoparsec.
  • Scala / Rust / OCaml: all have flagship combinator libraries.
  • JSON / Markdown / DSL: most modern parsers are combinator-based; the uniform shape pays for itself.

Limitations to be honest about

  • Combinators with naive backtracking can be exponential on pathological inputs; production libraries add committed choice (PEG style).
  • Left-recursive grammars need special handling (chainl1, packrat, etc.).
  • For very large grammars where ambiguity must be resolved by the parser generator (e.g. GLR), generator-based tools still win.

Check yourself

? quiz

What's the practical difference between using parser combinators and using a parser generator like ANTLR or YACC?

Exercise

Using the lit and palt combinators above, build a parser that recognizes "red", "green", or "blue" at the start of a string, and returns the parsed color as a keyword.

loading sci
press ⌘/Ctrl-↵ or click ▶ run to evaluate
 status: new