~ / track G / neighboring paradigms
Probabilistic programming
AdvancedA probabilistic programming language (PPL) lets you write generative models — programs whose outputs are random variables — and then invert them: given an observation, infer the posterior distribution over the inputs that could have produced it. You describe the model; the runtime handles Bayesian inference. Anglican, Stan, Pyro, Gen, and Turing.jl are canonical examples.
What "inverting a program" means
A standard program takes inputs and produces an output. A probabilistic program treats some of its variables as random (drawn from distributions) and lets you ask:
- Forward (generation): sample from the joint distribution of variables → "what does the model say usually happens?"
- Backward (inference): given an observation of one or more variables, what's the posterior over the rest? → "given what I saw, what were the hidden causes?"
A toy generative story: a coin's bias θ comes from a Beta prior; we flip
it n times and observe k heads.
θ ~ Beta(1, 1)
flips ~ Binomial(n, θ)
observe flips = k
infer θ
Plain Bayesian statistics — but the program is the model. Change one line of code, change the model.
Why express it as a program
- Models are code. Anything your language can express can be a model: loops, recursion, control flow, calling functions. You're not limited to a fixed-shape graphical model.
- Composition. A model can call another model the same way a function calls another function.
- Reuse. Inference algorithms are written once over the language; any model written in the language can be inferred over.
In Anglican / Gen / Pyro:
;; pseudo-anglican
(defquery coin-bias [k n]
(let [theta (sample (beta 1 1))]
(observe (binomial n theta) k)
theta))
(doquery :smc coin-bias [7 10])
;; → posterior samples for theta after observing 7/10 heads
sample declares a random choice; observe constrains a random variable to
an observed value; the inference algorithm (Sequential Monte Carlo,
Hamiltonian Monte Carlo, Variational Inference, etc.) handles the rest.
The inference toolkit
A PPL provides a menu of algorithms:
- Likelihood-weighted importance sampling. Run the model forward; reweight samples by how well they match observations.
- MCMC (Metropolis-Hastings, HMC). Walk the parameter space, accepting proposals according to posterior density.
- Variational inference. Fit a parameterised approximating distribution by optimization.
- Sequential Monte Carlo. Particle filters — natural for state-space models.
You pick the algorithm to suit the model. Same model, different inference strategy: the separation is what makes PPLs powerful.
Why this is functional
- Generative model = pure function with random choices. The whole thing is a value transformation.
- Inference algorithm = higher-order function. Takes a model, returns a distribution.
- Composition. Sub-models nest like normal function calls; the runtime threads the randomness/tracing through.
- Immutability. Each sample is an immutable trace of choices; no mutable state to corrupt across runs.
Where PPLs help
- Modeling uncertainty explicitly in scientific or business workflows (epidemiological models, demand forecasting, A/B testing).
- Hierarchical models — easy to encode "groups within groups within groups" structures.
- Domain-specific generative models that don't fit a standard library (e.g. specialized clinical models, generative grammars over images).
The price: inference can be expensive, and PPLs introduce a learning curve about prior choice, model checking, and convergence diagnostics.
Check yourself
? quiz
A PPL separates the *model* from the *inference algorithm*. Why is that separation valuable?
Exercise
Sketch (in prose or pseudocode) a generative model for the following
scenario: a customer either becomes a paying user with some unknown
probability p, and if they do, the amount they pay is drawn from a
log-normal distribution with parameters μ and σ. Given a dataset of N
customers (with 0 for non-payers and the payment amount for payers),
describe what you would infer — i.e. which variables are posterior
distributions you want.