One variable at a time

Metropolis-Hastings proposes moves in all dimensions at once. In high dimensions, finding a good proposal is hard. What if you updated just one variable at a time, holding the others fixed?

What do you think?
You want to sample from a 2D distribution f(x, y). Gibbs sampling alternates between updating x given y, and y given x. What does each update require?

The algorithm

Gibbs Sampling

To sample from f(x1,x2,,xp)f(x_1, x_2, \ldots, x_p):

  1. Start at (x1(0),x2(0),,xp(0))(x_1^{(0)}, x_2^{(0)}, \ldots, x_p^{(0)})
  2. For each iteration, update each variable in turn:
    • Draw x1(t+1)f(x1x2(t),x3(t),,xp(t))x_1^{(t+1)} \sim f(x_1 \mid x_2^{(t)}, x_3^{(t)}, \ldots, x_p^{(t)})
    • Draw x2(t+1)f(x2x1(t+1),x3(t),,xp(t))x_2^{(t+1)} \sim f(x_2 \mid x_1^{(t+1)}, x_3^{(t)}, \ldots, x_p^{(t)})
    • \vdots
    • Draw xp(t+1)f(xpx1(t+1),,xp1(t+1))x_p^{(t+1)} \sim f(x_p \mid x_1^{(t+1)}, \ldots, x_{p-1}^{(t+1)})
  3. Repeat.

Notice the "Manhattan" movement: each step changes only one coordinate, like navigating a grid city.

Watch it explore

See how Gibbs sampling explores a 2D distribution using axis-aligned moves — alternating horizontal and vertical steps:

Gibbs Sampling Visualizer
Target: Bivariate Normal (ρ = 0.6)
01234560246x₁x₂
Steps
0
Current
(1.0, 1.0)
Sample mean
(1.00, 1.00)
Target μ
(3, 3)

Gibbs is a special case of Metropolis-Hastings

Gibbs as MH

Gibbs sampling is Metropolis-Hastings where the proposal is the full conditional distribution. The acceptance probability is always 1 — every proposal is accepted.

Why acceptance = 1
α=min(1,f(x1,x2)f(x1x2)f(x1,x2)f(x1x2))\alpha = \min\left(1, \frac{f(x_1^*, x_2) \cdot f(x_1 | x_2)}{f(x_1, x_2) \cdot f(x_1^* | x_2)}\right)
In MH, acceptance for updating x₁ is
Step 1 of 4

When to use Gibbs vs. Metropolis-Hastings

GibbsMetropolis-Hastings
RequiresClosed-form conditionalsOnly unnormalized target
ProposalsAlways acceptedOften rejected
MovementAxis-aligned onlyAny direction
Best forConjugate Bayesian modelsArbitrary distributions
WeaknessSlow if variables are highly correlatedTuning the proposal

When xx and yy are strongly correlated, Gibbs sampling slows down. The sampler must zigzag along a narrow diagonal ridge because it can only make axis-aligned moves.

Applications in Bayesian inference

Gibbs sampling powers many Bayesian methods:

  • Bayesian linear regression: sample β\beta given σ2\sigma^2, then σ2\sigma^2 given β\beta
  • Latent Dirichlet Allocation (LDA): topic models for text
  • Mixture models: sample cluster assignments given parameters, then parameters given assignments
  • Image segmentation: sample pixel labels given neighbors

Practice problems

Gibbs sampling requires what kind of distributions to be tractable? (one word)
What is the acceptance probability for a Gibbs update? (whole number)

Summary

ConceptKey Idea
Gibbs samplingUpdate one variable at a time from its full conditional
MovementAxis-aligned ("Manhattan" steps)
AcceptanceAlways 1 (special case of MH)
RequirementClosed-form conditional distributions
WeaknessSlow when variables are highly correlated

Whenever the conditionals are conjugate (e.g., Normal-Normal, Beta-Binomial), Gibbs sampling gives you MCMC without any proposal tuning.

What's next

We'll shift from discrete-time processes to continuous-time with Poisson processes and how they model random events on a timeline.