~ / track G / neighboring paradigms

Probabilistic programming

Advanced

A probabilistic programming language (PPL) lets you write generative models — programs whose outputs are random variables — and then invert them: given an observation, infer the posterior distribution over the inputs that could have produced it. You describe the model; the runtime handles Bayesian inference. Anglican, Stan, Pyro, Gen, and Turing.jl are canonical examples.

What "inverting a program" means

A standard program takes inputs and produces an output. A probabilistic program treats some of its variables as random (drawn from distributions) and lets you ask:

  • Forward (generation): sample from the joint distribution of variables → "what does the model say usually happens?"
  • Backward (inference): given an observation of one or more variables, what's the posterior over the rest? → "given what I saw, what were the hidden causes?"

A toy generative story: a coin's bias θ comes from a Beta prior; we flip it n times and observe k heads.

θ ~ Beta(1, 1)
flips ~ Binomial(n, θ)
observe flips = k
infer θ

Plain Bayesian statistics — but the program is the model. Change one line of code, change the model.

Why express it as a program

  • Models are code. Anything your language can express can be a model: loops, recursion, control flow, calling functions. You're not limited to a fixed-shape graphical model.
  • Composition. A model can call another model the same way a function calls another function.
  • Reuse. Inference algorithms are written once over the language; any model written in the language can be inferred over.

In Anglican / Gen / Pyro:

;; pseudo-anglican
(defquery coin-bias [k n]
  (let [theta (sample (beta 1 1))]
    (observe (binomial n theta) k)
    theta))

(doquery :smc coin-bias [7 10])
;; → posterior samples for theta after observing 7/10 heads

sample declares a random choice; observe constrains a random variable to an observed value; the inference algorithm (Sequential Monte Carlo, Hamiltonian Monte Carlo, Variational Inference, etc.) handles the rest.

The inference toolkit

A PPL provides a menu of algorithms:

  • Likelihood-weighted importance sampling. Run the model forward; reweight samples by how well they match observations.
  • MCMC (Metropolis-Hastings, HMC). Walk the parameter space, accepting proposals according to posterior density.
  • Variational inference. Fit a parameterised approximating distribution by optimization.
  • Sequential Monte Carlo. Particle filters — natural for state-space models.

You pick the algorithm to suit the model. Same model, different inference strategy: the separation is what makes PPLs powerful.

Why this is functional

  • Generative model = pure function with random choices. The whole thing is a value transformation.
  • Inference algorithm = higher-order function. Takes a model, returns a distribution.
  • Composition. Sub-models nest like normal function calls; the runtime threads the randomness/tracing through.
  • Immutability. Each sample is an immutable trace of choices; no mutable state to corrupt across runs.

Where PPLs help

  • Modeling uncertainty explicitly in scientific or business workflows (epidemiological models, demand forecasting, A/B testing).
  • Hierarchical models — easy to encode "groups within groups within groups" structures.
  • Domain-specific generative models that don't fit a standard library (e.g. specialized clinical models, generative grammars over images).

The price: inference can be expensive, and PPLs introduce a learning curve about prior choice, model checking, and convergence diagnostics.

Check yourself

? quiz

A PPL separates the *model* from the *inference algorithm*. Why is that separation valuable?

Exercise

Sketch (in prose or pseudocode) a generative model for the following scenario: a customer either becomes a paying user with some unknown probability p, and if they do, the amount they pay is drawn from a log-normal distribution with parameters μ and σ. Given a dataset of N customers (with 0 for non-payers and the payment amount for payers), describe what you would infer — i.e. which variables are posterior distributions you want.

 status: new