Law of total probability

A factory has two machines. Machine A produces 60% of widgets and 3% of its output is defective. Machine B produces 40% and 5% of its output is defective. A random widget is defective. Which machine made it?

What do you think?
What's the probability a randomly selected widget is defective?

The idea: partition and sum

You can't always compute P(A)P(A) directly. But if you can split the sample space into cases where P(A)P(A) is easy to compute, you can add up the pieces.

Law of Total Probability (LOTP)

If B1,B2,,BnB_1, B_2, \ldots, B_n are a partition of Ω (mutually exclusive, collectively exhaustive), then:

P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)

Each term is one "slice" of the sample space. You compute P(A)P(A) within each slice, weight by the slice's probability, and sum.

The simplest case uses two slices: BB and BcB^c.

P(A)=P(AB)P(B)+P(ABc)P(Bc)P(A) = P(A|B) \cdot P(B) + P(A|B^c) \cdot P(B^c)

Explore the partitions

Adjust the partition sizes and conditional probabilities. Watch how each slice contributes to P(A)P(A).

Law of Total Probability
P(A) = Σ P(A|Bᵢ) · P(Bᵢ) — adjust partitions and conditional probabilities
ΩB₁ (30%)80%B₂ (50%)40%B₃ (20%)10%P(A)
P(A) = P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + P(A|B₃)·P(B₃)
0.80×0.30 + 0.40×0.50 + 0.10×0.20 = 0.460
B₁
B₂
B₃
What do you think?
If P(A|B₁) = 1.0, P(A|B₂) = 0, and P(A|B₃) = 0, what is P(A)?

Solving the factory problem

Back to the widget problem. Machine A (B1B_1) produces 60% with 3% defect rate. Machine B (B2B_2) produces 40% with 5% defect rate.

P(Defective)
P(D)=P(DA)P(A)+P(DB)P(B)P(D) = P(D|A) \cdot P(A) + P(D|B) \cdot P(B)
Apply LOTP with the partition {Machine A, Machine B}.
Step 1 of 3

Machine A makes 60% of widgets but only contributes 0.018/0.038 = 47% of defects. Machine B makes 40% but contributes 53% of defects. The overall rate is a weighted average, not a simple average.

LOTP as a tree

LOTP is what you get when you multiply along branches and add across them. Every probability tree where you read "multiply down, add across" is using LOTP.

Probability Tree
52AA
Start: 52 cards, 4 aces
1/4
What do you think?
A student bus system has 3 routes. Route 1 carries 50% of students and runs on time 90% of the time. Route 2 carries 30% with 80% on-time. Route 3 carries 20% with 70% on-time. What's the overall on-time rate?
%
whole number

Why LOTP matters for Bayes

Bayes' Rule has P(E)P(E) in the denominator:

P(HE)=P(EH)P(H)P(E)P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}

How do you compute P(E)P(E)? LOTP. Partition on HH and ¬H\neg H:

P(E)=P(EH)P(H)+P(E¬H)P(¬H)P(E) = P(E|H) \cdot P(H) + P(E|\neg H) \cdot P(\neg H)

LOTP is the engine that powers Bayes' Rule. Without it, you can't compute the denominator.

P(B) = 0.3, P(A|B) = 0.8, P(A|B^c) = 0.2. What is P(A)? (decimal, e.g. 0.42)
Urn 1 has 4 red, 6 blue. Urn 2 has 7 red, 3 blue. You pick an urn at random (50-50), then draw a ball. P(red)? (decimal, e.g. 0.42)
A disease test: P(disease) = 0.01, P(+|disease) = 0.99, P(+|healthy) = 0.05. What is P(+)? (decimal, e.g. 0.042)

The general version

For any partition B1,B2,,BnB_1, B_2, \ldots, B_n of Ω:

P(A)=i=1nP(ABi)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^{n} P(A \cap B_i) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)

The first equality is from the addition rule (the ABiA \cap B_i are disjoint). The second uses the multiplication rule on each term.

What do you think?
Why must the Bᵢ be mutually exclusive AND exhaustive?

Test your understanding

Three equally likely boxes. Box 1: 2 red, 3 blue. Box 2: 4 red, 1 blue. Box 3: 1 red, 4 blue. P(red)? (decimal, e.g. 0.042)
1/3

What's next

LOTP gives you P(E)P(E). Combined with the multiplication rule, you have everything needed for Bayes' Rule — the formula for flipping conditional probabilities and updating beliefs with evidence.