Why do averages work?

You flip a coin 10 times and get 70% heads — unusual, but not shocking. Flip 10,000 times and you always land near 50%. Why?

What do you think?
You flip a fair coin 10,000 times. Which range will almost certainly contain the fraction of heads?

The sample average

Given nn independent observations X1,X2,,XnX_1, X_2, \ldots, X_n each with mean μ\mu and variance σ2\sigma^2, the sample average is:

Xˉn=1ni=1nXi\bar{X}_n = \Large\frac{1}{n}\normalsize\sum_{i=1}^{n} X_i

Key facts about Xˉn\bar{X}_n:

PropertyValue
E[Xˉn]E[\bar{X}_n]μ\mu
Var(Xˉn)\text{Var}(\bar{X}_n)σ2/n\sigma^2 / n
SD(Xˉn)\text{SD}(\bar{X}_n)σ/n\sigma / \sqrt{n}
Variance of the Sample Average
Var(Xˉn)=Var ⁣(1ni=1nXi)\text{Var}(\bar{X}_n) = \text{Var}\!\left(\frac{1}{n}\sum_{i=1}^n X_i\right)
Start with the definition of the sample average.
Step 1 of 3

The variance of the average shrinks like 1/n1/n. This is the engine behind everything.

The theorem

Weak Law of Large Numbers

For independent, identically distributed X1,X2,X_1, X_2, \ldots with finite mean μ\mu and variance σ2\sigma^2: for any ϵ>0\epsilon > 0, P(Xˉnμϵ)0 as nP(|\bar{X}_n - \mu| \geq \epsilon) \to 0 \text{ as } n \to \infty The sample average converges to the true mean in probability.

Proof by Chebyshev

Proving the Weak LLN
P(Xˉnμϵ)Var(Xˉn)ϵ2P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\epsilon^2}
Apply [Chebyshev's inequality](/lessons/inequalities) directly to X̄ₙ.
Step 1 of 3

The entire proof of the LLN is just Chebyshev + the 1/n1/n variance fact. Two ingredients, one of the most important theorems in probability.

Watch it happen

Law of Large Numbers
Distribution
502000
0.50-0.501.500125250375500Click "Generate" to start
Samples
0
Sample mean
True mean
0.5000
|Error|

Generate samples and watch the running average converge. Try different sources: a coin, a die, an exponential. The path wobbles at the start but locks onto the true mean as nn grows.

What do you think?
If you switch from a fair coin (σ = 0.5) to a fair die (σ ≈ 1.71), does convergence get faster or slower?

How fast does it converge?

From the proof, for any ϵ\epsilon:

P(Xˉnμϵ)σ2nϵ2P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}

So to ensure this probability is at most δ\delta:

nσ2ϵ2δn \geq \frac{\sigma^2}{\epsilon^2 \delta}

σ² = 4. How many samples to guarantee P(|X̄ₙ − μ| ≥ 0.1) ≤ 0.05? (whole number)
A fair coin: σ² = 0.25. Samples needed for P(|X̄ − 0.5| ≥ 0.01) ≤ 0.01? (whole number)

Weak vs. strong

VersionStatementConvergence type
Weak LLNP(Xˉnμϵ)0P(\|\bar{X}_n - \mu\| \geq \epsilon) \to 0In probability
Strong LLNP(Xˉnμ)=1P(\bar{X}_n \to \mu) = 1Almost surely

The strong law says: with probability 1, the sample average eventually stays arbitrarily close to μ\mu — forever. The weak law only says the probability of a large deviation vanishes; in principle, the average could occasionally wander (though this doesn't actually happen).

In practice, the distinction rarely matters. Both say: average enough independent observations and you'll learn the mean.

Why it matters

The LLN is the mathematical foundation for:

  • Polling: Survey 1,000 people and the sample proportion approximates the population proportion
  • Monte Carlo simulation: Estimate integrals by averaging random samples
  • Insurance: Average claims per customer stabilize as the customer pool grows
  • Machine learning: Training loss on a large batch approximates expected loss
What do you think?
A casino game has expected house edge of 2%. Why doesn't the casino worry about losing money on any given night?

Connection to the central limit theorem

The LLN says Xˉnμ\bar{X}_n \to \mu. But how is the average distributed on the way there? The Central Limit Theorem answers this:

Xˉnμσ/ndN(0,1)\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)

The standardized average approaches a standard Normal — regardless of the original distribution. That's the next frontier.

Summary

ConceptKey point
Var(Xˉn)\text{Var}(\bar{X}_n)σ2/n\sigma^2/n — shrinks with sample size
Weak LLNP(Xˉnμϵ)σ2/(nϵ2)0P(\|\bar{X}_n - \mu\| \geq \epsilon) \leq \sigma^2/(n\epsilon^2) \to 0
ProofDirect from Chebyshev's inequality
Sample sizenσ2/(ϵ2δ)n \geq \sigma^2/(\epsilon^2\delta) for desired accuracy
Why it mattersAverages converge — the basis of statistics, simulation, and inference

The LLN is why statistics works. It guarantees that large enough samples reveal the true mean. Every confidence interval, every hypothesis test, every machine learning algorithm relies on this convergence.