Batch Inspection
100 items · 5 defective · sample 10!gooddefectivesampledCAUGHT 1/5 ✓P(detect) ≈ 40% with sample of 10 · most batches slip through

The factory's blind spot

What do you think?
A factory batch has 100 chips, 5 are defective. QC tests 10 random chips. If any are broken, they reject the batch. What percent of defective batches get caught?

A factory produces batches of 100 microchips. Quality control says: if a batch has 5 or more broken chips, reject it. But testing every chip is expensive.

Try the simulator. Drag the slider to see how sample size affects detection.

Batch Inspection
100 items · 5 defective · sample 10!gooddefectivesampledCAUGHT 1/5 ✓P(detect) ≈ 40% with sample of 10 · most batches slip through
Defect Detector
100 chips total, 5 defective
130
DetectMiss
42%
58%
P(detect)
41.6%
P(miss)
58.4%

Most defects slip through with only 10 tested

At sample size 10, the detection rate is only about 40%. You miss defective batches more often than you catch them.

The wrong question

What do you think?
We expect 0.5 broken chips in our sample of 10. Why doesn't this mean we catch defects 50% of the time?

The intuitive reasoning fails because it answers the wrong question. "Expected broken chips in sample" isn't the same as "probability of finding at least one."

The complement trick

Rather than calculate "at least one," calculate "exactly zero" and subtract from 1.

The Complement Trick
Direct: Calculate P(X ≥ 1)
= P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5)
= 5 separate calculations
Rule: "at least one" → calculate "none" and subtract from 1

The detection probability, step by step:

Calculating Detection Probability
P(detect)=1P(zero broken in sample)P(\text{detect}) = 1 - P(\text{zero broken in sample})
Use the complement: it's easier to calculate the probability of finding NO broken chips, then subtract from 1.
Step 1 of 5

With a sample size of 10, defective batches slip through more often than they get caught.

If P(zero defects in sample) = 0.60, what's P(detect at least one)? (decimal to 2 places, e.g. 0.53)
1/2

The hypergeometric distribution

This calculation uses the hypergeometric distribution, which applies when you sample without replacement from a finite population.

For sampling nn items from a population of NN containing KK "successes," the probability of exactly kk successes is:

P(X=k)=(Kk)(NKnk)(Nn)P(X=k) = \Large\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}

The numerator counts favorable outcomes (ways to pick kk from the successes and nkn-k from the failures). The denominator counts total ways to pick any nn items.

Step through the formula to see what each piece represents:

Formula Breakdown
Formula
Defective
5
Good
95
C(K,k) × C(N-K,n-k) / C(N,n)
P(X=k) = C(K,k) × C(N-K,n-k) / C(N,n)
1/5
For sampling n=5 from N=50 with K=10 successes, what's the denominator of P(X=k)?
1/2

The business trade-off

What do you think?
To get 90% detection of defective batches (5 broken chips out of 100), how many chips do you need to test?

This tension defines acceptance sampling:

  • Test more chips = higher cost, better detection
  • Test fewer chips = lower cost, bad batches ship

There's no sample size that's both cheap and reliable. The math forces a trade-off.

Real quality control uses Operating Characteristic curves: graphs showing detection probability vs. defect rate for a given sample size. They make the trade-off visible.

Build your own OC curve. Adjust the batch size and sample size to see how detection probability changes across different defect rates:

OC Curve
20500
1100
0%25%50%75%100%20%40%60%80%100%Defect RateP(detect)
@ 5% defects
42%
@ 10% defects
67%
@ 20% defects
90%

Where else this appears

The hypergeometric distribution shows up whenever you sample without replacement from a finite population:

  • Drug testing in sports (random selection of athletes)
  • Auditing transactions for fraud
  • Drawing cards from a deck
  • Jury selection from a pool

They all share the same structure: a finite population, sampling without replacement, and counting "hits" in the sample.

Key formula

The core of the hypergeometric calculation is the complement rule:

P(detect at least 1)=1P(zero defects in sample)=1(NKn)(Nn)P(\text{detect at least 1}) = 1 - P(\text{zero defects in sample}) = 1 - \frac{\binom{N-K}{n}}{\binom{N}{n}}

All the counting power comes from combinations. The numerator counts ways to pick nn items entirely from the NKN-K good ones; the denominator counts all possible samples.

Later, in the Hypergeometric Applications lesson, we'll derive the expected value and variance of the hypergeometric distribution. For now, the combinatorial formula is enough to expose the paradox.

Test your understanding

Batch: 50 items, 5 defective. Sample: 10 items. What's P(zero defects)? Use C(45,10)/C(50,10). (Enter as decimal, 2 decimals) (decimal to 2 places, e.g. 0.53)
If P(zero defects) = 0.31, what's the detection probability? (2 decimals) (decimal to 2 places, e.g. 0.53)
True or False: The complement rule (P(at least one) = 1 − P(none)) works here because 'at least one defect' is the complement of 'zero defects'.
True or False: To double the detection rate, you roughly need to double the sample size.