The urn problem

Imagine an urn with 50 balls: 20 red and 30 blue. You draw 10 balls.

What do you think?

Does it matter whether you put each ball back before drawing the next?

Two sampling strategies

With Replacement (Binomial):

Each draw is independent
Probability stays constant: $p = 20/50 = 0.4$
Number of reds $\sim \text{Bin}(10, 0.4)$

Without Replacement (Hypergeometric):

Each draw depends on previous draws
After drawing a red, the probability of red changes
This is the Hypergeometric distribution

The hypergeometric distribution

Hypergeometric Distribution

Drawing $n$ items without replacement from a population of $N$ items containing $K$ successes:

$P(X = k) = \Large\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

We write $X \sim \text{HGeom}(N, K, n)$ .

The formula counts:

$\binom{K}{k}$ : Ways to choose $k$ successes from $K$ available (using combinations)
$\binom{N-K}{n-k}$ : Ways to choose the remaining from failures
$\binom{N}{n}$ : Total ways to choose $n$ items

See how the three combinatorial pieces build the probability:

Formula Breakdown

Formula

Defective

Good

C(K,k) × C(N-K,n-k) / C(N,n)

P(X=k) = C(K,k) × C(N-K,n-k) / C(N,n)

1/5

Explore the difference

The Urn Simulator

Red Balls: 20

Blue Balls: 30

Draws (n): 10

20 + 30 = 50

With Replacement (Binomial)

Without Replacement (Hypergeom)

Notice: the centers (means) are the same, but without replacement has less spread.

Once you've drawn many reds, there are fewer reds left. The remaining draws can't all be red, so outcomes become negatively correlated.

Mean: the same!

Expected Value

For both Binomial and Hypergeometric: $E[X] = n \cdot \Large\frac{K}{N} \normalsize= np$ where $p = K/N$ is the proportion of successes.

Urn: 50 balls, 20 red. Draw 10 WITH replacement. E[reds] = ? (whole number)

Same urn, draw 10 WITHOUT replacement. E[reds] = ? (whole number)

Variance: here's the difference

Variance Comparison

Binomial: $\text{Var}(X) = np(1-p)$

Hypergeometric: $\text{Var}(X) = np(1-p) \cdot \frac{N-n}{N-1}$

The factor $\frac{N-n}{N-1}$ is called the Finite Population Correction (FPC).

Variance Comparison

Binomial Hypergeometric

Sample size (n): 10

550

Mean (both)

4.0

Var (Bin)

2.40

Var (Hyp)

2.18

FPC = (100-10)/(100-1) = 0.909

Binomial Var for n=10, p=0.4: np(1-p) = ? (decimal to 1 place, e.g. 3.7)

Hypergeometric Var with N=50? Multiply by (50-10)/(50-1). (decimal to 2 places, e.g. 0.53)

The key takeaway

Distribution	Mean	Variance
Binomial	$np$	$np(1-p)$
Hypergeometric	$np$	$np(1-p) \times \text{FPC}$

The FPC is always ≤ 1, so hypergeometric variance is always lower.

Rule of thumb: If your sample is less than 5% of the population ( $n < 0.05N$ ), the Binomial is a good approximation.

What's next

In the next lesson, we'll explore when to use each distribution, work through practical examples like card games and quality control, and understand the FPC in more depth.