The independence question

You're at a casino. The roulette wheel has landed on red five times in a row.

What do you think?
Is black more likely on the next spin?

Our brains are wired to see patterns, even where none exist. Independence is the mathematical way to say "knowing one thing tells you nothing about another."

Defining independence

We've seen independence for events: AA and BB are independent if P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B).

For random variables, we extend this idea:

Independent Random Variables

Random variables XX and YY are independent if for all values xx and yy: P(X=x,Y=y)=P(X=x)P(Y=y)P(X = x, Y = y) = P(X = x) \cdot P(Y = y)

Equivalently, knowing XX doesn't change the distribution of YY.

In words: the joint probability factors into the product of individual probabilities.

What independence means

When XX and YY are independent:

  1. No information transfer: Observing XX tells you nothing about YY
  2. Conditional equals unconditional: P(Y=yX=x)=P(Y=y)P(Y = y | X = x) = P(Y = y)
  3. The joint PMF factors: pX,Y(x,y)=pX(x)pY(y)p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y)

Independence means you can analyze XX and YY separately. Their stories don't interact.

Example: two dice

Roll two fair dice. Let XX = first die, YY = second die.

The physical separation of the dice makes them independent. Verifying mathematically:

What do you think?
Check: P(X=3, Y=5) should equal P(X=3) × P(Y=5). Does it?

This factorization works for every pair of values. That's what makes them independent.

Calculation shortcuts

Independence gives us shortcuts that simplify calculations.

Rule 1: expectations multiply

Product of Independent RVs

If XX and YY are independent: E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y]

For dependent variables, E[XY]E[XY] would require knowing the joint distribution. For independent variables, we just multiply the means.

If X and Y are independent with E[X]=3 and E[Y]=4, what is E[XY]? (whole number)

Rule 2: variances add

Variance of Sum

If XX and YY are independent: Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

This is why errors from independent sources add in quadrature. Uncertainties don't compound as badly as you might fear.

Variance of Sums
Var(X)=4
+
Var(Y)=9
+
2Cov=+0
=
13
-10+1
Var(X)+Var(Y)
13
2×Cov(X,Y)
0.0
Var(X+Y)
13.0
ρ = 0: Variances simply add (independent)
If Var(X)=9 and Var(Y)=16, and they're independent, what is Var(X+Y)? (whole number)

For dependent variables, we'd need a covariance term: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y).

Independence means Cov(X,Y)=0\text{Cov}(X, Y) = 0.

IID: the gold standard

In statistics, we often assume data points are "IID":

IID Random Variables

Random variables are IID (independent and identically distributed) if:

  1. They are mutually independent
  2. They all have the same distribution

Examples:

  • Coin flips are IID Bernoulli
  • Repeated measurements (done carefully) are IID
  • Random samples from a large population are approximately IID

Most statistical procedures assume IID data. When this assumption fails, results can be misleading.

Independence vs. correlation

Uncorrelated

XX and YY are uncorrelated if Cov(X,Y)=0\text{Cov}(X,Y) = 0.

Key relationships:

  • Independent ⟹ Uncorrelated (always true)
  • Uncorrelated ⟹ Independent (NOT always true!)
Independence vs Dependence
XY
No pattern: X and Y are independent

Counterexample: Let XUniform(1,1)X \sim \text{Uniform}(-1, 1) and Y=X2Y = X^2.

They're uncorrelated (E[XY] = E[X]E[Y] = 0), but knowing XX completely determines YY!

Why independence matters

Simplifies calculations

  • Joint distributions factor
  • Expectations multiply
  • Variances add

Enables statistical inference

Real-world consequences

Assuming independence when it's false leads to underestimating risk. The 2008 financial crisis was partly due to assuming mortgage defaults were independent (they weren't).

Summary

For Independent RVsFormula
Joint PMFp(x,y)=pX(x)pY(y)p(x,y) = p_X(x) \cdot p_Y(y)
Expected productE[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y]
Variance of sumVar(X+Y)=Var(X)+Var(Y)\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)
CovarianceCov(X,Y)=0\text{Cov}(X,Y) = 0

Independence is about information. XX and YY are independent when learning XX leaves your beliefs about YY unchanged.

Test your understanding

X and Y independent, E[X]=2, E[Y]=5. What is E[XY]? (whole number)
Same X,Y. Var(X)=3, Var(Y)=4. What is Var(X+Y)? (whole number)
True/False: Cov(X,Y)=0 implies X,Y independent.

What's next

You now have the core concepts of random variables: PMFs, CDFs, distributions, and independence. These form the foundation for continuous distributions, the Law of Large Numbers, and the Central Limit Theorem.