~ / track G / neighboring paradigms
Parser combinators
IntermediateA parser combinator is just a function that consumes some input and returns either a parsed value plus the leftover input, or a failure. The magic is that you can combine small parsers into bigger ones — sequence, choice, repetition — and get a full grammar without a parser generator and without code generation. It's the functional answer to YACC.
The shape
A parser is a function String → {:ok value rest} | {:err msg}. Tiny
example:
lit is a parser factory: give it a string, get a parser that recognizes
that string at the start of the input. The whole library is small functions
like this.
Sequence and choice
Two combinators dominate every parser library:
pseq— run parsers in order, both must succeed.palt— try one; if it fails, try the next.
palt reads as "yes or no." Add pseq and you can describe arbitrary
grammars.
Why combinators are powerful
- Composition is the API. You don't learn a grammar DSL; you stick small functions together with normal function composition.
- First-class grammars. Build a parser at runtime, return one from a function, store one in a data structure. A grammar generator is just a function.
- Error messages come from where they happened. Failures carry the
position, and you can wrap parsers with
with-labelto produce domain-specific errors. - Same shape works for streams. Switch the "input" abstraction from String to a byte stream or token stream — the combinators don't care.
A taste of a real grammar
A toy expression grammar: digits | (expr + expr). With lit, palt, and
a pseq you didn't see, you'd write:
(def expr
(palt digits
(pseq (lit "(") expr (lit "+") expr (lit ")")
(fn [_ a _ b _] {:sum [a b]}))))
That's the entire grammar. No external file, no make parse. The grammar is
just code that runs.
In the wild
- Clojure:
instaparse(uses combinator-style declarative grammars) and hand-rolled combinator libraries. - Haskell:
parsec,megaparsec,attoparsec. - Scala / Rust / OCaml: all have flagship combinator libraries.
- JSON / Markdown / DSL: most modern parsers are combinator-based; the uniform shape pays for itself.
Limitations to be honest about
- Combinators with naive backtracking can be exponential on pathological inputs; production libraries add committed choice (PEG style).
- Left-recursive grammars need special handling (
chainl1, packrat, etc.). - For very large grammars where ambiguity must be resolved by the parser generator (e.g. GLR), generator-based tools still win.
Check yourself
? quiz
What's the practical difference between using parser combinators and using a parser generator like ANTLR or YACC?
Exercise
Using the lit and palt combinators above, build a parser that recognizes
"red", "green", or "blue" at the start of a string, and returns the parsed
color as a keyword.