One variable at a time

Metropolis-Hastings proposes moves in all dimensions at once. In high dimensions, finding a good proposal is hard. What if you updated just one variable at a time, holding the others fixed?

What do you think?

You want to sample from a 2D distribution f(x, y). Gibbs sampling alternates between updating x given y, and y given x. What does each update require?

The algorithm

Gibbs Sampling

To sample from $f(x_1, x_2, \ldots, x_p)$ :

Start at $(x_1^{(0)}, x_2^{(0)}, \ldots, x_p^{(0)})$
For each iteration, update each variable in turn:
- Draw $x_1^{(t+1)} \sim f(x_1 \mid x_2^{(t)}, x_3^{(t)}, \ldots, x_p^{(t)})$
- Draw $x_2^{(t+1)} \sim f(x_2 \mid x_1^{(t+1)}, x_3^{(t)}, \ldots, x_p^{(t)})$
- $\vdots$
- Draw $x_p^{(t+1)} \sim f(x_p \mid x_1^{(t+1)}, \ldots, x_{p-1}^{(t+1)})$
Repeat.

Notice the "Manhattan" movement: each step changes only one coordinate, like navigating a grid city.

Watch it explore

See how Gibbs sampling explores a 2D distribution using axis-aligned moves — alternating horizontal and vertical steps:

Gibbs Sampling Visualizer

Show pathTarget: Bivariate Normal (ρ = 0.6)

Steps

Current

(1.0, 1.0)

Sample mean

(1.00, 1.00)

Target μ

(3, 3)

Gibbs is a special case of Metropolis-Hastings

Gibbs as MH

Gibbs sampling is Metropolis-Hastings where the proposal is the full conditional distribution. The acceptance probability is always 1 — every proposal is accepted.

Why acceptance = 1

\alpha = \min\left(1, \frac{f(x_1^*, x_2) \cdot f(x_1 | x_2)}{f(x_1, x_2) \cdot f(x_1^* | x_2)}\right)

In MH, acceptance for updating x₁ is

Step 1 of 4

When to use Gibbs vs. Metropolis-Hastings

	Gibbs	Metropolis-Hastings
Requires	Closed-form conditionals	Only unnormalized target
Proposals	Always accepted	Often rejected
Movement	Axis-aligned only	Any direction
Best for	Conjugate Bayesian models	Arbitrary distributions
Weakness	Slow if variables are highly correlated	Tuning the proposal

When $x$ and $y$ are strongly correlated, Gibbs sampling slows down. The sampler must zigzag along a narrow diagonal ridge because it can only make axis-aligned moves.

Applications in Bayesian inference

Gibbs sampling powers many Bayesian methods:

Bayesian linear regression: sample $\beta$ given $\sigma^2$ , then $\sigma^2$ given $\beta$
Latent Dirichlet Allocation (LDA): topic models for text
Mixture models: sample cluster assignments given parameters, then parameters given assignments
Image segmentation: sample pixel labels given neighbors

Practice problems

Gibbs sampling requires what kind of distributions to be tractable? (one word)

What is the acceptance probability for a Gibbs update? (whole number)

Summary

Concept	Key Idea
Gibbs sampling	Update one variable at a time from its full conditional
Movement	Axis-aligned ("Manhattan" steps)
Acceptance	Always 1 (special case of MH)
Requirement	Closed-form conditional distributions
Weakness	Slow when variables are highly correlated

Whenever the conditionals are conjugate (e.g., Normal-Normal, Beta-Binomial), Gibbs sampling gives you MCMC without any proposal tuning.

What's next

We'll shift from discrete-time processes to continuous-time with Poisson processes and how they model random events on a timeline.