Why do averages work?
You flip a coin 10 times and get 70% heads — unusual, but not shocking. Flip 10,000 times and you always land near 50%. Why?
The sample average
Given independent observations each with mean and variance , the sample average is:
Key facts about :
| Property | Value |
|---|---|
The variance of the average shrinks like . This is the engine behind everything.
The theorem
For independent, identically distributed with finite mean and variance : for any , The sample average converges to the true mean in probability.
Proof by Chebyshev
The entire proof of the LLN is just Chebyshev + the variance fact. Two ingredients, one of the most important theorems in probability.
Watch it happen
Generate samples and watch the running average converge. Try different sources: a coin, a die, an exponential. The path wobbles at the start but locks onto the true mean as grows.
How fast does it converge?
From the proof, for any :
So to ensure this probability is at most :
Weak vs. strong
| Version | Statement | Convergence type |
|---|---|---|
| Weak LLN | In probability | |
| Strong LLN | Almost surely |
The strong law says: with probability 1, the sample average eventually stays arbitrarily close to — forever. The weak law only says the probability of a large deviation vanishes; in principle, the average could occasionally wander (though this doesn't actually happen).
In practice, the distinction rarely matters. Both say: average enough independent observations and you'll learn the mean.
Why it matters
The LLN is the mathematical foundation for:
- Polling: Survey 1,000 people and the sample proportion approximates the population proportion
- Monte Carlo simulation: Estimate integrals by averaging random samples
- Insurance: Average claims per customer stabilize as the customer pool grows
- Machine learning: Training loss on a large batch approximates expected loss
Connection to the central limit theorem
The LLN says . But how is the average distributed on the way there? The Central Limit Theorem answers this:
The standardized average approaches a standard Normal — regardless of the original distribution. That's the next frontier.
Summary
| Concept | Key point |
|---|---|
| — shrinks with sample size | |
| Weak LLN | |
| Proof | Direct from Chebyshev's inequality |
| Sample size | for desired accuracy |
| Why it matters | Averages converge — the basis of statistics, simulation, and inference |
The LLN is why statistics works. It guarantees that large enough samples reveal the true mean. Every confidence interval, every hypothesis test, every machine learning algorithm relies on this convergence.