The factory's blind spot

What do you think?

A factory batch has 100 chips, 5 are defective. QC tests 10 random chips. If any are broken, they reject the batch. What percent of defective batches get caught?

A factory produces batches of 100 microchips. Quality control says: if a batch has 5 or more broken chips, reject it. But testing every chip is expensive.

Try the simulator. Drag the slider to see how sample size affects detection.

Batch Inspection

Defect Detector

100 chips total, 5 defective

Sample size: 10

130

DetectMiss

42%

58%

P(detect)

41.6%

P(miss)

58.4%

Most defects slip through with only 10 tested

At sample size 10, the detection rate is only about 40%. You miss defective batches more often than you catch them.

The wrong question

What do you think?

We expect 0.5 broken chips in our sample of 10. Why doesn't this mean we catch defects 50% of the time?

The intuitive reasoning fails because it answers the wrong question. "Expected broken chips in sample" isn't the same as "probability of finding at least one."

The complement trick

Rather than calculate "at least one," calculate "exactly zero" and subtract from 1.

The Complement Trick

Direct: Calculate P(X ≥ 1)

= P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5)

= 5 separate calculations

Rule: "at least one" → calculate "none" and subtract from 1

The detection probability, step by step:

Calculating Detection Probability

P(\text{detect}) = 1 - P(\text{zero broken in sample})

Use the complement: it's easier to calculate the probability of finding NO broken chips, then subtract from 1.

Step 1 of 5

With a sample size of 10, defective batches slip through more often than they get caught.

If P(zero defects in sample) = 0.60, what's P(detect at least one)? (decimal to 2 places, e.g. 0.53)

1/2

The hypergeometric distribution

This calculation uses the hypergeometric distribution, which applies when you sample without replacement from a finite population.

For sampling $n$ items from a population of $N$ containing $K$ "successes," the probability of exactly $k$ successes is:

$P(X=k) = \Large\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

The numerator counts favorable outcomes (ways to pick $k$ from the successes and $n-k$ from the failures). The denominator counts total ways to pick any $n$ items.

Step through the formula to see what each piece represents:

Formula Breakdown

Formula

Defective

Good

C(K,k) × C(N-K,n-k) / C(N,n)

P(X=k) = C(K,k) × C(N-K,n-k) / C(N,n)

1/5

For sampling n=5 from N=50 with K=10 successes, what's the denominator of P(X=k)?

1/2

The business trade-off

What do you think?

To get 90% detection of defective batches (5 broken chips out of 100), how many chips do you need to test?

This tension defines acceptance sampling:

Test more chips = higher cost, better detection
Test fewer chips = lower cost, bad batches ship

There's no sample size that's both cheap and reliable. The math forces a trade-off.

Real quality control uses Operating Characteristic curves: graphs showing detection probability vs. defect rate for a given sample size. They make the trade-off visible.

Build your own OC curve. Adjust the batch size and sample size to see how detection probability changes across different defect rates:

OC Curve

Batch size (N): 100

20500

Sample size (n): 10 (10%)

1100

@ 5% defects

42%

@ 10% defects

67%

@ 20% defects

90%

Where else this appears

The hypergeometric distribution shows up whenever you sample without replacement from a finite population:

Drug testing in sports (random selection of athletes)
Auditing transactions for fraud
Drawing cards from a deck
Jury selection from a pool

They all share the same structure: a finite population, sampling without replacement, and counting "hits" in the sample.

Key formula

The core of the hypergeometric calculation is the complement rule:

$P(\text{detect at least 1}) = 1 - P(\text{zero defects in sample}) = 1 - \frac{\binom{N-K}{n}}{\binom{N}{n}}$

All the counting power comes from combinations. The numerator counts ways to pick $n$ items entirely from the $N-K$ good ones; the denominator counts all possible samples.

Later, in the Hypergeometric Applications lesson, we'll derive the expected value and variance of the hypergeometric distribution. For now, the combinatorial formula is enough to expose the paradox.

Test your understanding

Batch: 50 items, 5 defective. Sample: 10 items. What's P(zero defects)? Use C(45,10)/C(50,10). (Enter as decimal, 2 decimals) (decimal to 2 places, e.g. 0.53)

If P(zero defects) = 0.31, what's the detection probability? (2 decimals) (decimal to 2 places, e.g. 0.53)

True or False: The complement rule (P(at least one) = 1 − P(none)) works here because 'at least one defect' is the complement of 'zero defects'.

True or False: To double the detection rate, you roughly need to double the sample size.