Beyond the mean
The mean tells you where a distribution is centered. The variance tells you how spread out it is. But two distributions can share the same mean and variance yet look completely different. What else do we need?
The moment hierarchy
The -th moment of a random variable is: The -th central moment is: where .
Each moment tells you something different:
| Moment | Formula | What it measures |
|---|---|---|
| 1st | Center (mean) | |
| 2nd central | Spread (variance) | |
| 3rd standardized | Asymmetry (skewness) | |
| 4th standardized | Tail heaviness (kurtosis) |
Switch between distributions to see how the four moments change. The Normal has skewness 0 and kurtosis 3 (the baseline). The Exponential is right-skewed with heavier tails.
Skewness
- Skew = 0: symmetric (Normal, Uniform)
- Skew > 0: right tail is longer (Exponential, Poisson)
- Skew < 0: left tail is longer
Kurtosis
The Normal distribution has kurtosis = 3. Excess kurtosis = Kurt − 3 measures deviation from normality.
Kurtosis is often misunderstood as "peakedness." It's really about tail weight. High kurtosis means more probability in the extreme tails — more outliers, not necessarily a sharper peak.
The moment generating function
Computing moments one at a time is tedious. The MGF encodes all of them in a single function.
The MGF of is: defined for all in a neighborhood of 0. Then: The -th moment is the -th derivative of evaluated at .
Why does this work? Expand as a Taylor series:
MGFs of common distributions
| Distribution | MGF | Valid for |
|---|---|---|
| Bernoulli() | all | |
| Binomial() | all | |
| Poisson() | all | |
| Normal() | all | |
| Exponential() |
The key MGF property
If and are independent: MGFs convert sums into products — they're probability's version of Fourier transforms.
Not all distributions have MGFs. For example, the Cauchy distribution's is infinite for all . When MGFs exist, they uniquely determine the distribution.
Summary
| Concept | Key Idea |
|---|---|
| -th moment | — raw information about the distribution's shape |
| Skewness | 3rd standardized moment — measures asymmetry |
| Kurtosis | 4th standardized moment — measures tail weight |
| MGF | encodes all moments |
| MGF derivatives | |
| Independent sums |
Moments summarize shape; the MGF packages them all. When you need to identify a sum's distribution, reaching for MGFs is often the fastest route.
What's next
We know the mean and variance. We have the MGF. But how tightly does the mean constrain where values actually fall? Enter Markov's and Chebyshev's inequalities — the first tools for bounding tail probabilities.