A factory has two machines. Machine A produces 60% of widgets and 3% of its output is defective. Machine B produces 40% and 5% of its output is defective. A random widget is defective. Which machine made it?
What do you think?
What's the probability a randomly selected widget is defective?
The idea: partition and sum
You can't always compute P(A) directly. But if you can split the sample space into cases where P(A) is easy to compute, you can add up the pieces.
Law of Total Probability (LOTP)
If B1,B2,…,Bn are a partition of Ω (mutually exclusive, collectively exhaustive), then:
P(A)=∑i=1nP(A∣Bi)⋅P(Bi)
Each term is one "slice" of the sample space. You compute P(A) within each slice, weight by the slice's probability, and sum.
The simplest case uses two slices: B and Bc.
P(A)=P(A∣B)⋅P(B)+P(A∣Bc)⋅P(Bc)
Explore the partitions
Adjust the partition sizes and conditional probabilities. Watch how each slice contributes to P(A).
If P(A|B₁) = 1.0, P(A|B₂) = 0, and P(A|B₃) = 0, what is P(A)?
Solving the factory problem
Back to the widget problem. Machine A (B1) produces 60% with 3% defect rate. Machine B (B2) produces 40% with 5% defect rate.
P(Defective)
P(D)=P(D∣A)⋅P(A)+P(D∣B)⋅P(B)
Apply LOTP with the partition {Machine A, Machine B}.
Step 1 of 3
Machine A makes 60% of widgets but only contributes 0.018/0.038 = 47% of defects. Machine B makes 40% but contributes 53% of defects. The overall rate is a weighted average, not a simple average.
LOTP as a tree
LOTP is what you get when you multiply along branches and add across them. Every probability tree where you read "multiply down, add across" is using LOTP.
Probability Tree
Start: 52 cards, 4 aces
1/4
What do you think?
A student bus system has 3 routes. Route 1 carries 50% of students and runs on time 90% of the time. Route 2 carries 30% with 80% on-time. Route 3 carries 20% with 70% on-time. What's the overall on-time rate?
%
whole number
Why LOTP matters for Bayes
Bayes' Rule has P(E) in the denominator:
P(H∣E)=P(E)P(E∣H)⋅P(H)
How do you compute P(E)? LOTP. Partition on H and ¬H:
P(E)=P(E∣H)⋅P(H)+P(E∣¬H)⋅P(¬H)
LOTP is the engine that powers Bayes' Rule. Without it, you can't compute the denominator.
P(B) = 0.3, P(A|B) = 0.8, P(A|B^c) = 0.2. What is P(A)? (decimal, e.g. 0.42)
Urn 1 has 4 red, 6 blue. Urn 2 has 7 red, 3 blue. You pick an urn at random (50-50), then draw a ball. P(red)? (decimal, e.g. 0.42)
A disease test: P(disease) = 0.01, P(+|disease) = 0.99, P(+|healthy) = 0.05. What is P(+)? (decimal, e.g. 0.042)
The general version
For any partition B1,B2,…,Bn of Ω:
P(A)=∑i=1nP(A∩Bi)=∑i=1nP(A∣Bi)⋅P(Bi)
The first equality is from the addition rule (the A∩Bi are disjoint). The second uses the multiplication rule on each term.
What do you think?
Why must the Bᵢ be mutually exclusive AND exhaustive?
LOTP gives you P(E). Combined with the multiplication rule, you have everything needed for Bayes' Rule — the formula for flipping conditional probabilities and updating beliefs with evidence.