From events to numbers

What do you think?
You flip a fair coin 10 times. Which outcome is most likely: exactly 5 heads, at least 8 heads, or fewer than 3 heads?

So far, we've talked about events: "roll a 6," "draw an ace," "test positive." Now we'll start extracting numbers from random experiments.

You flip a coin 10 times. How many heads did you get? The answer could be 0, 1, 2, ..., up to 10. Each flip is random, so the count is random too. This "random count" is our first example of a random variable.

What is a random variable?

Random Variable

A random variable is a function that assigns a numerical value to each outcome in a sample space. We typically denote random variables with capital letters like XX, YY, ZZ.

Think of a random variable as a measurement or score you compute from the outcome of a random experiment.

Examples

ExperimentRandom VariablePossible ValuesType
Roll a dieX = face value1, 2, 3, 4, 5, 6Discrete
Flip 10 coinsY = number of heads0, 1, 2, ..., 10Discrete
Pick a personH = their heightAny positive numberContinuous
Wait for a busT = waiting timeAny non-negative numberContinuous

Discrete vs. continuous

Random variables come in two flavors:

Discrete Random Variable

A random variable that can only take on a countable number of values (like integers, or a finite set). You can list all possibilities.

Continuous Random Variable

A random variable that can take on any value in some interval. There are uncountably many possibilities.

Discrete examples: Number of emails you receive, dice rolls, coin flip counts, number of customers

Continuous examples: Height, weight, temperature, time, distance

Discrete or Continuous?

Can you list all possible values? If yes, it's discrete. If the values form a continuous range, it's continuous.

Question 1 of 6 | Score: 0/0

Number of emails you receive today

The key question: Can you list all possible values? If yes, then it's discrete. If the values form a continuous range, then it's continuous.

Notation and events

When we write X=3X = 3, we mean "the event that the random variable XX takes the value 3."

We can use random variables to describe events:

  • P(X=5)P(X = 5): probability that XX equals 5
  • P(X3)P(X \leq 3): probability that XX is at most 3
  • P(2<X<7)P(2 < X < 7): probability that XX is between 2 and 7

Example: sum of two dice

Let XX = sum of two fair dice. The sample space has 36 equally likely outcomes. The random variable XX can take values 2, 3, 4, ..., 12.

Outcome to Number
2
3
4
5
6
7
3
4
5
6
7
8
4
5
6
7
8
9
5
6
7
8
9
10
6
7
8
9
10
11
7
8
9
10
11
12
What do you think?
Which sum is most likely when rolling two dice? Hint: see the sum grid above.
What's P(X = 2) for the sum of two dice? Hint: see the sum grid above. (Enter as fraction e.g 5/36) (fraction, e.g. 2/7)
1/2

Why random variables matter

Random variables let us:

  1. Summarize complex outcomes with a single number
  2. Calculate expected values, variances, and other statistics
  3. Model real-world quantities like measurements and counts
  4. Compare different random phenomena on the same scale
What do you think?
Why is it useful to turn outcomes into numbers?

Functions of random variables

If XX is a random variable, then g(X)g(X) is also a random variable for any function gg.

Examples:

  • If XX is a test score, then X2X^2 is the squared score
  • If TT is temperature in Celsius, then 95T+32\frac{9}{5}T + 32 is temperature in Fahrenheit
  • If XX is income, then log(X)\log(X) is log-income
If X is the number shown on a die roll, what are the possible values of 2X? (comma-separated values, e.g. 1, 3, 5)
1/2

Multiple random variables

Often we work with several random variables at once:

  • XX = your score on exam 1, YY = your score on exam 2
  • XX = height, YY = weight of a randomly chosen person
  • X1,X2,,XnX_1, X_2, \ldots, X_n = results of nn independent experiments

Understanding how multiple random variables relate (whether they're independent, correlated, or dependent) lets us model complex situations.

What comes next

The next few lessons cover PMFs and CDFs for fully describing a discrete random variable, Bernoulli and binomial distributions for yes/no experiments, the hypergeometric distribution for sampling without replacement, independence of random variables, and expected value.

Summary

ConceptMeaningExample
Random variableA number determined by a random outcomeX = number of heads in 10 flips
DiscreteCountable possible valuesDie roll: {1,2,3,4,5,6}
ContinuousUncountable values in an intervalHeight: any value > 0
P(X=x)P(X = x)Probability that X takes value xP(X=7)P(X = 7) for dice sum
Function of RVg(X)g(X) is also a random variableIf X is temp in °C, then 9X/5+32 is °F

A random variable turns the abstract world of sample spaces into the concrete world of numbers. This lets us use all the tools of mathematics (algebra, calculus, statistics) to understand randomness.

Test your understanding

You draw 2 cards from a deck. Let X = number of aces. Is X discrete or continuous? (discrete or continuous)
For the sum of two dice, what's P(X ≥ 11)? (Sum is 11 or 12. Enter as fraction.) (fraction, e.g. 2/7)
If Y is the time until your next phone call, is Y discrete or continuous? (discrete or continuous)
If X is uniform on {1,2,3}, what's E[2X + 1]? (Hint: E[X] = 2 first) (whole number)

What's next

Now that we know what random variables are, how do we describe them completely? Enter the probability mass function (PMF) and cumulative distribution function (CDF), two complementary ways to capture everything about a discrete random variable's behavior.