Applying the hypergeometric

Sampling without replacement gives the same mean but lower variance than sampling with replacement. Here's when this matters in practice.

The finite population correction

FPC=NnN1\text{FPC} = \Large\frac{N-n}{N-1}

Finite Population Correction
0.91
FPC
199
Sampling
10%
FPC
0.909
FPC < 0.95: Use hypergeometric for accuracy
FPC = (100 - 10) / (100 - 1) = 0.909
ScenarioFPCInterpretation
n=1n = 1≈ 1Single draw, almost no difference
n=Nn = N0Draw everyone, no randomness left
nNn \ll N≈ 1Small sample, nearly independent

Rule of thumb: If n<0.05Nn < 0.05N (sample is less than 5% of population), the binomial approximation is fine.


Quality control

You're inspecting a batch of 100 items, 10 of which are defective. You test 20 items.

What do you think?
Which gives a more precise estimate of defects: with or without replacement?

Simulate batch inspections and see how detection probability depends on sample size and defect rate:

Defect Detector
100 chips total, 5 defective
130
DetectMiss
42%
58%
P(detect)
41.6%
P(miss)
58.4%

Most defects slip through with only 10 tested

Explore how the probability of accepting a batch changes as the true defect rate varies:

OC Curve
20500
1100
0%25%50%75%100%20%40%60%80%100%Defect RateP(detect)
@ 5% defects
42%
@ 10% defects
67%
@ 20% defects
90%

Polling

What do you think?
Surveying 1,000 from 100,000 voters vs. 1,000 from 5,000 voters. Which needs the hypergeometric?

Card games

Dealing cards is always without replacement.

What do you think?
What's the probability of exactly 2 Aces in a 5-card poker hand?

The calculation: P(X=2)=(42)(483)(525)=6×1729625989600.040P(X=2) = \Large\frac{\binom{4}{2}\binom{48}{3}}{\binom{52}{5}} = \frac{6 \times 17296}{2598960} \normalsize\approx 0.040


When to use each

Use Binomial when...Use Hypergeometric when...
Sampling with replacementSampling without replacement
Population is very largePopulation is finite and small
n<0.05Nn < 0.05N (approximation)n0.05Nn \geq 0.05N
Trials are independentTrials are dependent
Card game draws: which distribution?
Sampling 50 from 10,000: is binomial OK? (yes/no)

Common mistakes

What do you think?
Without replacement gives lower ___?

Summary

DistributionMeanVariance
Binomialnpnpnp(1p)np(1-p)
Hypergeometricnpnpnp(1p)×FPCnp(1-p) \times \text{FPC}

How you sample changes the variance, not the mean. Without replacement reduces uncertainty because outcomes become negatively correlated.

Urn: 100 balls, 30 red. Draw 20 without replacement. E[reds] = ? (whole number)
Which has HIGHER variance: binomial or hypergeometric?

What's next

We've seen that random variables can be independent (binomial) or dependent (hypergeometric). But what exactly does independence mean for random variables? That's our next topic.