Why do averages work?

You flip a coin 10 times and get 70% heads — unusual, but not shocking. Flip 10,000 times and you always land near 50%. Why?

What do you think?

You flip a fair coin 10,000 times. Which range will almost certainly contain the fraction of heads?

The sample average

Given $n$ independent observations $X_1, X_2, \ldots, X_n$ each with mean $\mu$ and variance $\sigma^2$ , the sample average is:

$\bar{X}_n = \Large\frac{1}{n}\normalsize\sum_{i=1}^{n} X_i$

Key facts about $\bar{X}_n$ :

Property	Value
$E[\bar{X}_n]$	$\mu$
$\text{Var}(\bar{X}_n)$	$\sigma^2 / n$
$\text{SD}(\bar{X}_n)$	$\sigma / \sqrt{n}$

Variance of the Sample Average

\text{Var}(\bar{X}_n) = \text{Var}\!\left(\frac{1}{n}\sum_{i=1}^n X_i\right)

Start with the definition of the sample average.

Step 1 of 3

The variance of the average shrinks like $1/n$ . This is the engine behind everything.

The theorem

Weak Law of Large Numbers

For independent, identically distributed $X_1, X_2, \ldots$ with finite mean $\mu$ and variance $\sigma^2$ : for any $\epsilon > 0$ , $P(|\bar{X}_n - \mu| \geq \epsilon) \to 0 \text{ as } n \to \infty$ The sample average converges to the true mean in probability.

Proof by Chebyshev

Proving the Weak LLN

P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\epsilon^2}

Apply [Chebyshev's inequality](/lessons/inequalities) directly to X̄ₙ.

Step 1 of 3

The entire proof of the LLN is just Chebyshev + the $1/n$ variance fact. Two ingredients, one of the most important theorems in probability.

Watch it happen

Law of Large Numbers

Distribution

Sample size (n): 500

502000

Samples

Sample mean

—

True mean

0.5000

|Error|

—

Generate samples and watch the running average converge. Try different sources: a coin, a die, an exponential. The path wobbles at the start but locks onto the true mean as $n$ grows.

What do you think?

If you switch from a fair coin (σ = 0.5) to a fair die (σ ≈ 1.71), does convergence get faster or slower?

How fast does it converge?

From the proof, for any $\epsilon$ :

$P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}$

So to ensure this probability is at most $\delta$ :

$n \geq \frac{\sigma^2}{\epsilon^2 \delta}$

σ² = 4. How many samples to guarantee P(|X̄ₙ − μ| ≥ 0.1) ≤ 0.05? (whole number)

A fair coin: σ² = 0.25. Samples needed for P(|X̄ − 0.5| ≥ 0.01) ≤ 0.01? (whole number)

Weak vs. strong

Version	Statement	Convergence type
Weak LLN	$P(\\|\bar{X}_n - \mu\\| \geq \epsilon) \to 0$	In probability
Strong LLN	$P(\bar{X}_n \to \mu) = 1$	Almost surely

The strong law says: with probability 1, the sample average eventually stays arbitrarily close to $\mu$ — forever. The weak law only says the probability of a large deviation vanishes; in principle, the average could occasionally wander (though this doesn't actually happen).

In practice, the distinction rarely matters. Both say: average enough independent observations and you'll learn the mean.

Why it matters

The LLN is the mathematical foundation for:

Polling: Survey 1,000 people and the sample proportion approximates the population proportion
Monte Carlo simulation: Estimate integrals by averaging random samples
Insurance: Average claims per customer stabilize as the customer pool grows
Machine learning: Training loss on a large batch approximates expected loss

What do you think?

A casino game has expected house edge of 2%. Why doesn't the casino worry about losing money on any given night?

Connection to the central limit theorem

The LLN says $\bar{X}_n \to \mu$ . But how is the average distributed on the way there? The Central Limit Theorem answers this:

$\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)$

The standardized average approaches a standard Normal — regardless of the original distribution. That's the next frontier.

Summary

Concept	Key point
$\text{Var}(\bar{X}_n)$	$\sigma^2/n$ — shrinks with sample size
Weak LLN	$P(\\|\bar{X}_n - \mu\\| \geq \epsilon) \leq \sigma^2/(n\epsilon^2) \to 0$
Proof	Direct from Chebyshev's inequality
Sample size	$n \geq \sigma^2/(\epsilon^2\delta)$ for desired accuracy
Why it matters	Averages converge — the basis of statistics, simulation, and inference

The LLN is why statistics works. It guarantees that large enough samples reveal the true mean. Every confidence interval, every hypothesis test, every machine learning algorithm relies on this convergence.