The robot explorer
You want to sample from a complex distribution, but you can't invert the CDF or use any standard method. What if you sent a "robot" to wander the target density, spending more time in high-probability regions?
The algorithm
To sample from a target distribution (known up to a normalizing constant):
- Start at some
- Propose a move:
- Accept with probability
- If accepted, . Otherwise, .
- Repeat.
You don't need to know the normalizing constant. If , the constant cancels in the ratio , which is what makes MCMC practical for Bayesian inference.
Watch the robot explore
The robot wanders the target density. Its trail of dots forms the target distribution. Notice how it spends more time on peaks and less time in valleys:
Why does it work?
The acceptance probability is carefully designed so that the resulting Markov chain satisfies detailed balance with respect to :
Symmetric proposals
When the proposal distribution is symmetric (, e.g., a random walk), the acceptance simplifies to:
This is the original Metropolis algorithm (1953).
Burn-in: The first several hundred (or thousand) samples are biased by the starting point. Discard them before using the samples for inference.
Tuning the proposal
| Proposal Step Size | Acceptance Rate | Exploration |
|---|---|---|
| Too small | Very high (~95%) | Slow — takes forever to explore |
| Too large | Very low (~5%) | Gets stuck — most proposals rejected |
| Just right | ~25-50% | Efficient exploration |
Practice problems
Summary
| Concept | Key Idea |
|---|---|
| Goal | Sample from without knowing the normalizing constant |
| Proposal | Suggest from |
| Accept/reject | $\alpha = \min(1, f(x^*)q(x |
| Why it works | Detailed balance → is the stationary distribution |
| Burn-in | Discard initial samples |
MCMC turns a sampling problem into a simulation problem. You don't need to solve any equations — just run the chain long enough and collect the dots.
What's next
Metropolis-Hastings updates all coordinates at once. Gibbs sampling updates one coordinate at a time — often much more efficient for multivariate distributions.