The robot explorer

You want to sample from a complex distribution, but you can't invert the CDF or use any standard method. What if you sent a "robot" to wander the target density, spending more time in high-probability regions?

What do you think?
A robot explores a 1D density surface where height = probability density. It proposes random steps. Should it always accept steps to lower ground?

The algorithm

Metropolis-Hastings Algorithm

To sample from a target distribution f(x)f(x) (known up to a normalizing constant):

  1. Start at some x0x_0
  2. Propose a move: xq(xxt)x^* \sim q(x^* \mid x_t)
  3. Accept with probability α=min(1,f(x)q(xtx)f(xt)q(xxt))\alpha = \min\left(1, \frac{f(x^*) \, q(x_t \mid x^*)}{f(x_t) \, q(x^* \mid x_t)}\right)
  4. If accepted, xt+1=xx_{t+1} = x^*. Otherwise, xt+1=xtx_{t+1} = x_t.
  5. Repeat.

You don't need to know the normalizing constant. If f(x)=cg(x)f(x) = c \cdot g(x), the constant cc cancels in the ratio f(x)/f(xt)f(x^*)/f(x_t), which is what makes MCMC practical for Bayesian inference.

Watch the robot explore

The robot wanders the target density. Its trail of dots forms the target distribution. Notice how it spends more time on peaks and less time in valleys:

Metropolis-Hastings Explorer
1.0
02468Target π(x)Samples
Samples
0
Accept rate
%
Current x
3.50

Why does it work?

The acceptance probability is carefully designed so that the resulting Markov chain satisfies detailed balance with respect to ff:

Detailed balance proof sketch
f(x)q(yx)α(xy)=f(y)q(xy)α(yx)f(x) \, q(y|x) \, \alpha(x \to y) = f(y) \, q(x|y) \, \alpha(y \to x)
Want: flow from x to y equals flow from y to x
Step 1 of 4

Symmetric proposals

When the proposal distribution is symmetric (q(yx)=q(xy)q(y|x) = q(x|y), e.g., a random walk), the acceptance simplifies to:

α=min(1,f(x)f(xt))\alpha = \min\left(1, \frac{f(x^*)}{f(x_t)}\right)

This is the original Metropolis algorithm (1953).

Burn-in: The first several hundred (or thousand) samples are biased by the starting point. Discard them before using the samples for inference.

Tuning the proposal

Proposal Step SizeAcceptance RateExploration
Too smallVery high (~95%)Slow — takes forever to explore
Too largeVery low (~5%)Gets stuck — most proposals rejected
Just right~25-50%Efficient exploration

Practice problems

In Metropolis-Hastings, if f(proposed) > f(current), what is the acceptance probability? (whole number)
If f(proposed)/f(current) = 0.3, you accept the move with probability ___. (decimal to 1 place, e.g. 0.8)

Summary

ConceptKey Idea
GoalSample from f(x)f(x) without knowing the normalizing constant
ProposalSuggest xx^* from q(xx)q(x^* \mid x)
Accept/reject$\alpha = \min(1, f(x^*)q(x
Why it worksDetailed balance → ff is the stationary distribution
Burn-inDiscard initial samples

MCMC turns a sampling problem into a simulation problem. You don't need to solve any equations — just run the chain long enough and collect the dots.

What's next

Metropolis-Hastings updates all coordinates at once. Gibbs sampling updates one coordinate at a time — often much more efficient for multivariate distributions.