Introduction to Probability
The mathematics of chance and uncertainty
You already think about probability every day. When you check the weather and see “70% chance of rain,” you grab an umbrella. When you buy a lottery ticket, you know your odds are slim - but you buy it anyway because someone has to win. When you decide whether to leave early for work based on how traffic “usually” is, you are making a probability judgment.
Probability is simply the language we use to describe uncertainty. It takes our intuitive sense of “likely” and “unlikely” and makes it precise. Instead of saying “it will probably rain,” we can say “there is a 70% chance of rain.” Instead of “I might get a parking spot,” we can calculate the actual odds based on the number of open spaces. This precision matters - it helps doctors evaluate treatments, engineers design safe bridges, and insurance companies set fair rates.
The good news is that you already understand the basic idea. Probability just gives you the tools to be more exact about what you already know: some things are certain, some things are impossible, and most things fall somewhere in between.
Core Concepts
What Is Probability?
Probability is a number that measures how likely something is to happen. It is always a number between 0 and 1 (or equivalently, between 0% and 100%).
- Probability = 0: The event is impossible. It will never happen no matter how many times you try.
- Probability = 1: The event is certain. It will definitely happen.
- Probability between 0 and 1: The event might happen. The higher the number, the more likely it is.
Think of probability as a scale:
$$0 \quad \leftarrow \text{ impossible } \cdots \text{ unlikely } \cdots \text{ even chance } \cdots \text{ likely } \cdots \text{ certain } \rightarrow \quad 1$$
Here is the key insight: probability describes what happens in the long run. If a weather forecast says there is a 30% chance of rain, that does not mean it will rain for 30% of the day. It means that on days with similar conditions, it rains about 30% of the time. If you lived through 100 days with a “30% chance of rain” forecast, you would expect roughly 30 of them to actually be rainy.
This idea - that probability describes long-run relative frequency - is foundational. One coin flip might land on heads or tails; we cannot predict which. But flip that coin 10,000 times, and you will get heads very close to 5,000 times. The randomness of individual events averages out when you have enough of them.
Experiments, Sample Spaces, and Events
To talk about probability precisely, we need a few definitions.
An experiment is any process that has uncertain outcomes. Rolling a die is an experiment. So is flipping a coin, drawing a card from a deck, checking tomorrow’s weather, or asking a random person their favorite color. The key feature is that you do not know in advance what will happen.
The sample space (written as $S$) is the set of all possible outcomes of an experiment. When you roll a standard six-sided die, the sample space is:
$$S = {1, 2, 3, 4, 5, 6}$$
When you flip a coin, the sample space is:
$$S = {\text{heads}, \text{tails}}$$
An event is any collection of outcomes you are interested in - technically, any subset of the sample space. If you roll a die and want to know about rolling an even number, that is an event:
$$A = {2, 4, 6}$$
Events can contain one outcome, multiple outcomes, all outcomes, or even no outcomes. “Rolling a 5” is an event with one outcome. “Rolling an even number” is an event with three outcomes. “Rolling a number between 1 and 6” is an event that includes the entire sample space (it is certain to happen). “Rolling a 7” on a standard die is an event with no outcomes (it is impossible).
Probability Notation
We write $P(A)$ to mean “the probability that event A happens.” This notation is universal in probability and statistics.
$$P(\text{heads}) = 0.5$$
This reads as “the probability of getting heads equals 0.5” or equivalently “there is a 50% chance of getting heads.”
Some examples of probability notation:
- $P(\text{rolling a 6}) = \frac{1}{6}$
- $P(\text{rain tomorrow}) = 0.3$
- $P(\text{drawing a heart}) = \frac{13}{52} = \frac{1}{4}$
The Probability Rules
Every probability must follow these fundamental rules:
Rule 1: Probabilities are between 0 and 1
$$0 \leq P(A) \leq 1$$
If someone tells you the probability of something is $-0.2$ or $1.5$, they have made a mistake. Probabilities cannot be negative (how could something be less likely than impossible?) and cannot exceed 1 (how could something be more likely than certain?).
Rule 2: The probability of the sample space is 1
$$P(S) = 1$$
Something has to happen. When you flip a coin, you will get heads or tails. When you roll a die, you will get 1, 2, 3, 4, 5, or 6. The probability that some outcome occurs is always 1 (100%).
Rule 3: The Complement Rule
The complement of an event $A$, written $A^c$ (read as “A complement” or “not A”), is the event that $A$ does NOT happen. If $A$ is “rolling a 6,” then $A^c$ is “not rolling a 6” (which means rolling 1, 2, 3, 4, or 5).
The complement rule states:
$$P(A^c) = 1 - P(A)$$
This makes intuitive sense. If there is a 30% chance of rain, there is a 70% chance of no rain. If you have a 1/6 chance of rolling a 6, you have a 5/6 chance of not rolling a 6. The probabilities of an event and its complement always add up to 1.
The complement rule is incredibly useful. Sometimes it is easier to calculate the probability of something NOT happening and then subtract from 1.
Equally Likely Outcomes and the Classical Definition
When all outcomes in the sample space are equally likely (read: when each outcome has the same chance of occurring), calculating probability is straightforward:
$$P(A) = \frac{\text{Number of outcomes in } A}{\text{Total number of outcomes in } S} = \frac{\text{favorable outcomes}}{\text{total outcomes}}$$
This is sometimes called the classical definition of probability or the theoretical probability.
For a fair die, each of the six faces is equally likely. To find the probability of rolling an even number:
- Favorable outcomes: 2, 4, 6 (three outcomes)
- Total outcomes: 1, 2, 3, 4, 5, 6 (six outcomes)
- Probability: $\frac{3}{6} = \frac{1}{2}$
This formula only works when outcomes are equally likely. If you have a weighted die that lands on 6 more often, you cannot just count outcomes - you would need to know the actual weights.
Theoretical vs. Experimental Probability
Theoretical probability is what should happen based on mathematical reasoning. If a coin is fair, the theoretical probability of heads is $\frac{1}{2}$ because the two outcomes are equally likely.
Experimental probability (also called empirical probability) is what actually happens when you run an experiment. If you flip a coin 100 times and get 53 heads, the experimental probability of heads is:
$$P(\text{heads}) = \frac{53}{100} = 0.53$$
Why the difference? Random variation. In any finite number of trials, you will see fluctuations. You might flip a fair coin 10 times and get 7 heads. That does not mean the coin is unfair - it means randomness does not produce perfect results in small samples.
The Law of Large Numbers
Here is where the magic happens. The Law of Large Numbers states that as you perform more and more trials of an experiment, the experimental probability gets closer and closer to the theoretical probability.
Flip a coin 10 times, and you might get 70% heads. Flip it 100 times, and you might get 53% heads. Flip it 10,000 times, and you will probably get something very close to 50% heads. Flip it a million times, and you will be even closer.
This is why probability works. Even though individual events are unpredictable, patterns emerge in the long run. Casinos do not know who will win any particular bet, but they know exactly how much money they will make over thousands of bets. Insurance companies cannot predict which house will burn down, but they can accurately predict how many houses out of a million will burn.
The Law of Large Numbers is the bridge between the randomness of individual events and the predictability of averages.
The Gambler’s Fallacy
The gambler’s fallacy is one of the most common and dangerous misconceptions about probability. It is the mistaken belief that past random events affect future random events.
Imagine you are flipping a fair coin and it has landed on heads five times in a row. Many people feel that tails is “due” - that the next flip is more likely to be tails because heads has been coming up too often.
This is wrong. The coin has no memory. Each flip is independent of the previous flips. The probability of tails on the next flip is still exactly $\frac{1}{2}$, the same as it always was.
Why do people fall for this? Because they confuse two different questions:
- “What is the probability of getting 6 heads in a row?” (This is small: $\frac{1}{64}$)
- “Given that I already got 5 heads, what is the probability of heads on the next flip?” (This is $\frac{1}{2}$)
The first question is about a sequence of events before any of them happen. The second is about one future event, and the past outcomes are irrelevant.
The gambler’s fallacy has cost people a lot of money. Roulette wheels do not remember previous spins. Lottery numbers are not “due” to come up. Each random event is a fresh start.
Notation and Terminology
| Term | Meaning | Example |
|---|---|---|
| Experiment | A process with uncertain outcomes | Rolling a die |
| Sample space ($S$) | Set of all possible outcomes | ${1, 2, 3, 4, 5, 6}$ |
| Event | A subset of the sample space | Rolling an even number |
| $P(A)$ | Probability of event A | $P(\text{heads}) = 0.5$ |
| Complement ($A^c$) | Event A does NOT happen | “Not heads” = tails |
| Theoretical probability | Based on equally likely outcomes | $\frac{\text{favorable}}{\text{total}}$ |
| Experimental probability | Based on actual trials | $\frac{\text{observed successes}}{\text{trials}}$ |
Examples
Find the probability of each event: a) Flipping a fair coin and getting heads b) Rolling a standard die and getting a 4 c) Rolling a standard die and getting a number less than 3
Solution:
a) Probability of heads: The sample space is {heads, tails} with 2 equally likely outcomes. Only 1 outcome is heads. $$P(\text{heads}) = \frac{1}{2} = 0.5 = 50%$$
b) Probability of rolling a 4: The sample space is {1, 2, 3, 4, 5, 6} with 6 equally likely outcomes. Only 1 outcome is a 4. $$P(\text{rolling 4}) = \frac{1}{6} \approx 0.167 = 16.7%$$
c) Probability of rolling less than 3: Numbers less than 3 are 1 and 2. That is 2 favorable outcomes out of 6 total. $$P(\text{less than 3}) = \frac{2}{6} = \frac{1}{3} \approx 0.333 = 33.3%$$
A bag contains 10 marbles: 3 red, 4 blue, and 3 green. If you draw one marble at random, what is the probability of NOT drawing a red marble?
Solution:
Method 1: Direct calculation “Not red” means blue or green. There are $4 + 3 = 7$ non-red marbles out of 10 total. $$P(\text{not red}) = \frac{7}{10} = 0.7 = 70%$$
Method 2: Using the complement rule First, find the probability of drawing red. $$P(\text{red}) = \frac{3}{10} = 0.3$$
Then apply the complement rule: $$P(\text{not red}) = 1 - P(\text{red}) = 1 - 0.3 = 0.7 = 70%$$
Both methods give the same answer. The complement rule is especially useful when the “not” version has many outcomes and the original event has few.
A standard deck of 52 playing cards has 4 suits (hearts, diamonds, clubs, spades) with 13 cards each (Ace through King). Hearts and diamonds are red; clubs and spades are black.
Find the probability of drawing: a) A heart b) A face card (Jack, Queen, or King) c) A red card that is NOT a face card
Solution:
a) Probability of drawing a heart: There are 13 hearts in a deck of 52 cards. $$P(\text{heart}) = \frac{13}{52} = \frac{1}{4} = 0.25 = 25%$$
b) Probability of drawing a face card: Each suit has 3 face cards (J, Q, K), and there are 4 suits. So there are $3 \times 4 = 12$ face cards. $$P(\text{face card}) = \frac{12}{52} = \frac{3}{13} \approx 0.231 = 23.1%$$
c) Probability of a red non-face card: Red cards: hearts and diamonds = $13 + 13 = 26$ cards. Red face cards: 3 per red suit = $3 \times 2 = 6$ cards. Red non-face cards: $26 - 6 = 20$ cards. $$P(\text{red non-face}) = \frac{20}{52} = \frac{5}{13} \approx 0.385 = 38.5%$$
A spinner is divided into 4 equal sections colored red, blue, green, and yellow. A student spins it 80 times and records:
| Color | Times Landed |
|---|---|
| Red | 18 |
| Blue | 23 |
| Green | 19 |
| Yellow | 20 |
a) What is the theoretical probability of landing on blue? b) What is the experimental probability of landing on blue? c) Are these results consistent with a fair spinner? Explain.
Solution:
a) Theoretical probability of blue: There are 4 equal sections, so each color should be equally likely. $$P(\text{blue}) = \frac{1}{4} = 0.25 = 25%$$
b) Experimental probability of blue: Blue came up 23 times out of 80 spins. $$P(\text{blue}) = \frac{23}{80} = 0.2875 = 28.75%$$
c) Are these results consistent? Yes, these results are consistent with a fair spinner. The experimental probability (28.75%) is close to the theoretical probability (25%). The difference of 3.75 percentage points is normal random variation for only 80 trials.
To check all colors: with 80 spins, we would expect each color to appear about $80 \div 4 = 20$ times. The results (18, 23, 19, 20) are all close to 20. If the experimental probabilities were far from the theoretical values (say, blue appeared 40 times), we might suspect the spinner is unfair. But this small variation is exactly what we would expect from randomness.
Sarah is at a casino watching a roulette wheel. In American roulette, there are 38 slots: 18 red, 18 black, and 2 green. She has watched the last 10 spins, and the ball has landed on black 8 times. Sarah says, “Black has been coming up way too often. Red is due! I’m going to bet big on red.”
a) What is Sarah’s reasoning? b) Is Sarah’s reasoning correct? Explain. c) What is the actual probability that the next spin will be red?
Solution:
a) Sarah’s reasoning: Sarah believes that because black has appeared frequently in recent spins, red must be more likely to appear next. She thinks the outcomes need to “balance out” and that red is “due” to catch up.
b) Is Sarah’s reasoning correct? No, Sarah is committing the gambler’s fallacy. Each spin of the roulette wheel is an independent event. The wheel has no memory of previous spins. The ball does not know or care that black has come up 8 times recently. Past outcomes have absolutely no influence on future outcomes.
Sarah is confusing two different things:
- The probability of getting 8 blacks in 10 spins (before any spins happen) is relatively low
- The probability of red on the next spin (after 8 blacks have already occurred) is the same as it always is
The 8 black outcomes are in the past. They are already “locked in” and do not affect the future.
c) The actual probability of red on the next spin: There are 18 red slots out of 38 total slots, regardless of what happened before. $$P(\text{red}) = \frac{18}{38} = \frac{9}{19} \approx 0.474 = 47.4%$$
This is the same probability of red that existed before the 10 spins, and it will be the same probability of red after 100 more spins. Each spin is a fresh, independent event.
Note: In the long run, if Sarah tracked thousands of spins, she would see red about 47.4% of the time and black about 47.4% of the time (with green taking the remaining 5.2%). But this long-run average does not mean any individual spin is affected by recent history.
Key Properties and Rules
Probability Bounds
All probabilities must satisfy: $$0 \leq P(A) \leq 1$$
Impossible events: $P(A) = 0$ (example: rolling a 7 on a standard die)
Certain events: $P(A) = 1$ (example: rolling a number from 1 to 6 on a standard die)
The Complement Rule
For any event $A$: $$P(A^c) = 1 - P(A)$$
This is useful when “not A” is easier to calculate than “A.”
Example: To find the probability of rolling at least one 6 in four dice rolls, it is easier to calculate the probability of NOT rolling any 6s, then subtract from 1.
Probability of the Sample Space
$$P(S) = 1$$
Something must happen. The probabilities of all possible outcomes sum to 1.
Calculating Probability for Equally Likely Outcomes
When all outcomes are equally likely: $$P(A) = \frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}}$$
Remember: This formula only works when outcomes are equally likely. A fair coin: yes. A weighted coin: no.
Theoretical vs. Experimental Probability
| Type | Based On | Formula |
|---|---|---|
| Theoretical | Mathematical reasoning | $\frac{\text{favorable outcomes}}{\text{total outcomes}}$ |
| Experimental | Actual observations | $\frac{\text{times event occurred}}{\text{total trials}}$ |
As the number of trials increases, experimental probability converges to theoretical probability (Law of Large Numbers).
Real-World Applications
Weather Forecasts
When the forecast says “40% chance of rain,” meteorologists are using probability. This is based on experimental probability: historically, in conditions similar to today’s, it has rained 40% of the time. This does not mean it will rain for 40% of the day, nor does it mean there is definitely rain coming - it is a measure of uncertainty based on past data.
Understanding these probabilities helps you make better decisions. A 90% chance of rain means grab an umbrella. A 10% chance of rain means you are probably safe, but it is not a guarantee.
Medical Testing and False Positives
Medical tests are not perfect. Suppose a test for a disease is “95% accurate.” What does that mean? There are actually several types of errors:
- False positive: The test says you have the disease when you do not
- False negative: The test says you do not have the disease when you do
Probability helps doctors and patients understand these risks. If a disease is rare (say, 1 in 10,000 people have it), a positive test result might still be more likely to be a false positive than a true detection. This is why doctors often order follow-up tests - they are using probability to make good decisions.
Insurance and Risk Assessment
Insurance companies are essentially in the probability business. They collect data on how often events occur (car accidents, house fires, health issues) and use this to set premiums.
If 1 in 1,000 houses has a fire each year causing $200,000 in damage, the insurance company knows to expect about $200 per house in claims on average. They charge premiums above this to cover their costs and make a profit. This is experimental probability applied to risk management.
Game Design and Fairness
Game designers use probability to create balanced, fair experiences. In a board game, if one strategy wins 80% of the time, the game is not balanced. In video games, “drop rates” for rare items are carefully tuned probabilities.
Understanding probability helps you be a smarter player too. In many card games, knowing the probability that your opponent has certain cards can guide your strategy. In games with dice, understanding expected outcomes helps you make better decisions.
Quality Control
Manufacturers use probability in quality control. If a factory produces 10,000 widgets per day and the defect rate is 0.1%, they expect about 10 defective widgets daily. By sampling products and calculating experimental probabilities, they can monitor whether the manufacturing process is working correctly or if something has gone wrong.
Self-Test Problems
Problem 1: A jar contains 5 red, 8 blue, and 7 yellow candies. If you pick one candy at random, what is the probability of picking a blue candy?
Show Answer
Total candies: $5 + 8 + 7 = 20$
Blue candies: 8
$$P(\text{blue}) = \frac{8}{20} = \frac{2}{5} = 0.4 = 40%$$
Problem 2: Using the complement rule, find the probability of rolling a standard die and NOT getting a 5.
Show Answer
First, find the probability of rolling a 5: $$P(5) = \frac{1}{6}$$
Apply the complement rule: $$P(\text{not 5}) = 1 - P(5) = 1 - \frac{1}{6} = \frac{5}{6} \approx 0.833 = 83.3%$$
Problem 3: A coin is flipped 200 times and lands on heads 112 times. a) What is the experimental probability of heads? b) Is this coin likely to be fair? Explain your reasoning.
Show Answer
a) Experimental probability of heads: $$P(\text{heads}) = \frac{112}{200} = 0.56 = 56%$$
b) Is the coin likely fair? This is slightly more than the expected 50%, but 200 flips is not a huge number of trials. Getting 112 heads instead of the expected 100 is a deviation of 12, which is not extremely unusual for random variation. The coin might be fair with this result occurring by chance, or it might be slightly biased. More trials would help clarify. If you flipped it 2,000 times and still got 56% heads, that would be stronger evidence of bias.
Problem 4: A standard deck of cards is shuffled. What is the probability of drawing either an Ace or a King?
Show Answer
There are 4 Aces and 4 Kings in a deck, for a total of 8 favorable cards.
$$P(\text{Ace or King}) = \frac{8}{52} = \frac{2}{13} \approx 0.154 = 15.4%$$
Problem 5: Marcus has flipped a coin 5 times and gotten tails every time. His friend says, “The next flip is definitely going to be heads - you can not get 6 tails in a row!” Is his friend correct? Explain.
Show Answer
Marcus’s friend is incorrect and is committing the gambler’s fallacy.
Each coin flip is an independent event. The coin has no memory of previous flips. The probability of heads on the 6th flip is still exactly $\frac{1}{2} = 50%$, the same as it was before any flips occurred.
While the probability of getting 6 tails in a row (calculated before any flips) is $\left(\frac{1}{2}\right)^6 = \frac{1}{64}$, this is not the question. After 5 tails have already happened, we are only asking about the next flip, and that probability is 50%.
The friend is confusing “probability of 6 tails in a row” with “probability of tails on the next flip given 5 tails have occurred.”
Problem 6: At a carnival game, you spin a wheel with 20 equal sections numbered 1 through 20. You win a prize if you land on a number greater than 15. What is the probability of winning?
Show Answer
Numbers greater than 15 are: 16, 17, 18, 19, 20 (five numbers)
$$P(\text{win}) = \frac{5}{20} = \frac{1}{4} = 0.25 = 25%$$
Summary
-
Probability measures how likely an event is to occur, expressed as a number between 0 (impossible) and 1 (certain).
-
An experiment is a process with uncertain outcomes. The sample space ($S$) is the set of all possible outcomes. An event is any subset of the sample space.
-
We write $P(A)$ to denote the probability of event $A$ occurring.
-
The fundamental probability rules are:
- $0 \leq P(A) \leq 1$ for any event $A$
- $P(S) = 1$ (something must happen)
- $P(A^c) = 1 - P(A)$ (the complement rule)
-
For equally likely outcomes: $P(A) = \frac{\text{favorable outcomes}}{\text{total outcomes}}$
-
Theoretical probability is based on mathematical reasoning about equally likely outcomes. Experimental probability is based on actual observations from trials.
-
The Law of Large Numbers states that as you perform more trials, experimental probability approaches theoretical probability.
-
The gambler’s fallacy is the mistaken belief that past random events affect future independent events. A coin does not remember its previous flips; a roulette wheel does not know what came before. Each trial is fresh.
-
Probability has applications everywhere: weather forecasting, medical testing, insurance, game design, quality control, and any situation involving uncertainty.