Describing Data with Numbers—Spread
Understand how scattered or consistent data is
Two students might both have an 85% average in a class, but their journeys could be completely different. One student consistently scores between 83% and 87% on every assignment—steady and predictable. The other swings wildly: 65% one week, 100% the next, then 72%, then 95%. Same average, completely different stories. If you were betting on who would score at least 80% on the next test, which student would you trust more?
This is exactly why knowing the center of your data is not enough. You also need to know how spread out the data is—whether values cluster tightly around the center or scatter widely. This chapter gives you the tools to measure and describe that spread. Whether you are evaluating the consistency of a manufacturing process, comparing the volatility of investments, or understanding why some test score distributions are “tighter” than others, measures of spread tell the part of the story that averages leave out.
Core Concepts
Why Center Alone Is Not Enough
Imagine two pizza delivery services. Both advertise an average delivery time of 30 minutes. But Service A consistently delivers in 28–32 minutes, while Service B ranges from 15 to 45 minutes. Which would you rather order from?
The average tells you what to expect “on average,” but the spread tells you how reliable that average is. A small spread means consistency—you can count on getting something close to the average. A large spread means variability—the average might be misleading because individual values can be far from it.
Statisticians use several measures to quantify spread, each with its own strengths. We will start with the simplest and work our way up to more sophisticated tools.
Range: Simple but Limited
The range is the easiest measure of spread to calculate: just subtract the smallest value from the largest.
$$\text{Range} = \text{Maximum} - \text{Minimum}$$
If test scores in a class range from 62 to 98, the range is $98 - 62 = 36$ points.
The range gives you a quick sense of the total spread, but it has a major weakness: it depends entirely on just two values—the extreme ones. One unusual score (an outlier) can dramatically change the range even if most of the data is tightly clustered.
The strength of the range: It is fast and easy to calculate.
The weakness of the range: It is extremely sensitive to outliers and ignores everything between the minimum and maximum.
Quartiles: Dividing Data into Fourths
To get a better picture of how data spreads, we can divide it into four equal parts using quartiles. Think of quartiles as checkpoints that tell you where 25%, 50%, and 75% of your data falls.
- $Q_1$ (First Quartile): 25% of the data falls below this value
- $Q_2$ (Second Quartile): 50% of the data falls below this value (this is the median)
- $Q_3$ (Third Quartile): 75% of the data falls below this value
To find quartiles:
- Arrange your data in order from smallest to largest
- Find the median ($Q_2$)—this splits the data in half
- Find $Q_1$ as the median of the lower half (values below $Q_2$)
- Find $Q_3$ as the median of the upper half (values above $Q_2$)
If you have ever seen a standardized test report that says “You scored in the 75th percentile,” you scored at or above $Q_3$—better than 75% of test-takers.
The Five-Number Summary
The five-number summary gives you a quick snapshot of your data’s distribution using five key values:
- Minimum: The smallest value
- $Q_1$: The first quartile (25th percentile)
- Median ($Q_2$): The middle value (50th percentile)
- $Q_3$: The third quartile (75th percentile)
- Maximum: The largest value
These five numbers tell you where the data starts, where the middle 50% lies, and where it ends. They form the basis of box plots (also called box-and-whisker plots), which are powerful visual tools for comparing distributions.
Interquartile Range (IQR): The Middle 50%
The Interquartile Range (IQR) measures the spread of the middle 50% of your data—the range between the first and third quartiles.
$$\text{IQR} = Q_3 - Q_1$$
Why focus on the middle 50%? Because it ignores the extreme values at both ends, making the IQR resistant to outliers. No matter how extreme your minimum or maximum values are, the IQR stays stable.
The strength of the IQR: It is resistant to outliers—extreme values do not affect it.
The weakness of the IQR: It ignores 50% of your data (the upper and lower quarters).
The IQR tells you the range you would expect a “typical” middle observation to fall within.
Identifying Outliers Using the 1.5×IQR Rule
How do you know if a value is an outlier? While there is no absolute definition, the 1.5×IQR rule provides a commonly used standard.
A value is considered a potential outlier if it falls:
- Below $Q_1 - 1.5 \times \text{IQR}$, or
- Above $Q_3 + 1.5 \times \text{IQR}$
These boundaries are sometimes called the fences. Any value outside the fences is flagged as unusually extreme.
For example, if $Q_1 = 20$, $Q_3 = 40$, then $\text{IQR} = 20$.
- Lower fence: $20 - 1.5(20) = 20 - 30 = -10$
- Upper fence: $40 + 1.5(20) = 40 + 30 = 70$
Any value below $-10$ or above $70$ would be considered an outlier.
This rule is not about declaring values “wrong”—outliers can be perfectly valid data points. But identifying them helps you understand your data better and decide how to handle unusual observations.
Variance: Average Squared Distance from the Mean
While range and IQR give useful information, they do not use every data point. Variance does—it measures how far, on average, each value is from the mean.
The idea is simple: find how far each value deviates from the mean, then average those deviations. But there is a catch—some deviations are positive (above the mean) and some are negative (below the mean), and they would cancel out if we just added them. The solution? Square the deviations first, which makes them all positive.
Sample variance formula:
$$s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$$
Here is what each part means:
- $x_i$ is each individual data value
- $\bar{x}$ is the mean of the data
- $(x_i - \bar{x})$ is the deviation of each value from the mean
- $(x_i - \bar{x})^2$ is the squared deviation
- $\sum$ means add up all the squared deviations
- $n-1$ is the number of values minus one (we use $n-1$ instead of $n$ for samples to get a better estimate—this is called Bessel’s correction)
Why divide by $n-1$ instead of $n$? When working with a sample from a larger population, dividing by $n-1$ gives us a better estimate of the true population variance. This technical adjustment compensates for the fact that a sample tends to underestimate variability. For now, just remember: sample variance uses $n-1$ in the denominator.
A larger variance means data is more spread out; a smaller variance means data clusters more tightly around the mean.
Standard Deviation: The Typical Distance from the Mean
Variance has one awkward property: because we squared the deviations, the units are squared too. If your data is in dollars, the variance is in “square dollars”—which is hard to interpret.
Standard deviation solves this by taking the square root of the variance, bringing us back to the original units.
$$s = \sqrt{s^2} = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$$
The standard deviation tells you, roughly, the “typical” distance of a data point from the mean. If the mean is 100 and the standard deviation is 15, most values will be within about 15 units of 100.
The strength of standard deviation: It uses every data point and is in the same units as your data.
The weakness of standard deviation: Like the mean, it is sensitive to outliers.
Standard deviation is the most commonly used measure of spread in statistics. You will encounter it everywhere—test scores, scientific measurements, financial analysis, quality control, and more.
Interpreting Standard Deviation
What does a standard deviation of 10 actually mean? It depends on context. A standard deviation of 10 is large if the mean is 20, but small if the mean is 1000.
One useful rule of thumb for roughly bell-shaped (normal) distributions is the Empirical Rule (also called the 68-95-99.7 Rule):
- About 68% of values fall within 1 standard deviation of the mean
- About 95% of values fall within 2 standard deviations of the mean
- About 99.7% of values fall within 3 standard deviations of the mean
So if exam scores have a mean of 75 and a standard deviation of 10:
- About 68% of students score between 65 and 85
- About 95% of students score between 55 and 95
- Nearly all students score between 45 and 105
This rule helps you judge whether a particular value is typical, somewhat unusual, or extremely rare.
Comparing Spread Across Datasets
When comparing the spread of different data sets, keep these principles in mind:
Direct comparison: If two data sets have the same units and similar means, you can compare their standard deviations directly. A larger standard deviation means more spread.
Different scales: If data sets have very different means, the coefficient of variation (CV) allows for fair comparison:
$$\text{CV} = \frac{s}{\bar{x}} \times 100%$$
The CV expresses the standard deviation as a percentage of the mean. This lets you compare variability between data sets with different units or magnitudes. For example, is a standard deviation of $5,000 in salaries more or less variable than a standard deviation of 2 cm in heights? The CV helps answer this.
When to Use IQR vs. Standard Deviation
Both IQR and standard deviation measure spread, but they suit different situations:
Use IQR when:
- Your data is skewed
- There are outliers you want to downplay
- You are working with the median as your measure of center
- You want a resistant (robust) measure
Use standard deviation when:
- Your data is roughly symmetric
- There are no extreme outliers
- You are working with the mean as your measure of center
- You need to do further statistical calculations (most advanced statistics use standard deviation)
Here is a helpful pairing:
- Median + IQR for skewed data or data with outliers
- Mean + Standard deviation for symmetric data
Just as you would not use the mean for highly skewed data, you generally would not use standard deviation in that situation either.
Notation and Terminology
| Term | Meaning | Example |
|---|---|---|
| Range | Maximum − Minimum | Easy but sensitive to outliers |
| $Q_1$ (First Quartile) | 25th percentile | 25% of data below |
| $Q_2$ (Median) | 50th percentile | Middle value |
| $Q_3$ (Third Quartile) | 75th percentile | 75% of data below |
| IQR | $Q_3 - Q_1$ | Middle 50% spread |
| Variance ($s^2$) | Average of squared deviations | $s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$ |
| Standard deviation ($s$) | Square root of variance | Same units as data |
| Outlier rule | Values below $Q_1 - 1.5 \times \text{IQR}$ or above $Q_3 + 1.5 \times \text{IQR}$ | |
| Five-number summary | Min, $Q_1$, Median, $Q_3$, Max | Quick data snapshot |
| Resistant measure | Not affected by outliers | IQR is resistant |
| Empirical Rule | 68-95-99.7 rule for normal data | Interprets standard deviation |
| Coefficient of variation | $\frac{s}{\bar{x}} \times 100%$ | Compares relative spread |
Examples
The following data shows the number of books read by 9 students over summer break: 2, 3, 5, 7, 8, 9, 10, 12, 15.
Find the range and IQR.
Solution:
Range: $$\text{Range} = \text{Maximum} - \text{Minimum} = 15 - 2 = 13 \text{ books}$$
IQR:
Step 1: The data is already in order: 2, 3, 5, 7, 8, 9, 10, 12, 15.
Step 2: Find the median ($Q_2$). With 9 values, the median is the 5th value. $$Q_2 = 8$$
Step 3: Find $Q_1$, the median of the lower half (values below 8): 2, 3, 5, 7. With 4 values, $Q_1$ is the average of the 2nd and 3rd values. $$Q_1 = \frac{3 + 5}{2} = 4$$
Step 4: Find $Q_3$, the median of the upper half (values above 8): 9, 10, 12, 15. With 4 values, $Q_3$ is the average of the 2nd and 3rd values. $$Q_3 = \frac{10 + 12}{2} = 11$$
Step 5: Calculate IQR. $$\text{IQR} = Q_3 - Q_1 = 11 - 4 = 7 \text{ books}$$
The total spread is 13 books, but the middle 50% of students read between 4 and 11 books—a spread of 7 books.
A teacher records the quiz scores (out of 20) for 12 students: 14, 18, 12, 15, 19, 11, 16, 17, 13, 20, 15, 18.
Find the five-number summary.
Solution:
Step 1: Arrange the data in order. $$11, 12, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20$$
Step 2: Identify the minimum and maximum.
- Minimum = 11
- Maximum = 20
Step 3: Find the median. With 12 values (even), the median is the average of the 6th and 7th values. $$Q_2 = \frac{15 + 16}{2} = 15.5$$
Step 4: Find $Q_1$. The lower half is: 11, 12, 13, 14, 15, 15. With 6 values, $Q_1$ is the average of the 3rd and 4th values. $$Q_1 = \frac{13 + 14}{2} = 13.5$$
Step 5: Find $Q_3$. The upper half is: 16, 17, 18, 18, 19, 20. With 6 values, $Q_3$ is the average of the 3rd and 4th values. $$Q_3 = \frac{18 + 18}{2} = 18$$
Five-Number Summary:
| Statistic | Value |
|---|---|
| Minimum | 11 |
| $Q_1$ | 13.5 |
| Median | 15.5 |
| $Q_3$ | 18 |
| Maximum | 20 |
This tells us the scores range from 11 to 20, with the middle 50% falling between 13.5 and 18.
The ages (in years) of 6 employees at a small startup are: 24, 28, 30, 32, 35, 41.
Calculate the sample variance and standard deviation.
Solution:
Step 1: Find the mean. $$\bar{x} = \frac{24 + 28 + 30 + 32 + 35 + 41}{6} = \frac{190}{6} \approx 31.67$$
Step 2: Calculate each deviation from the mean $(x_i - \bar{x})$.
| $x_i$ | $x_i - \bar{x}$ |
|---|---|
| 24 | $24 - 31.67 = -7.67$ |
| 28 | $28 - 31.67 = -3.67$ |
| 30 | $30 - 31.67 = -1.67$ |
| 32 | $32 - 31.67 = 0.33$ |
| 35 | $35 - 31.67 = 3.33$ |
| 41 | $41 - 31.67 = 9.33$ |
Step 3: Square each deviation $(x_i - \bar{x})^2$.
| $x_i$ | $x_i - \bar{x}$ | $(x_i - \bar{x})^2$ |
|---|---|---|
| 24 | $-7.67$ | $58.83$ |
| 28 | $-3.67$ | $13.47$ |
| 30 | $-1.67$ | $2.79$ |
| 32 | $0.33$ | $0.11$ |
| 35 | $3.33$ | $11.09$ |
| 41 | $9.33$ | $87.05$ |
Step 4: Sum the squared deviations. $$\sum(x_i - \bar{x})^2 = 58.83 + 13.47 + 2.79 + 0.11 + 11.09 + 87.05 = 173.34$$
Step 5: Calculate the sample variance (divide by $n - 1 = 5$). $$s^2 = \frac{173.34}{5} = 34.67$$
Step 6: Calculate the standard deviation (take the square root). $$s = \sqrt{34.67} \approx 5.89 \text{ years}$$
Interpretation: The employees’ ages vary from the mean by about 5.89 years on average. Most employees are within roughly 6 years of the average age of about 32.
A company tracks daily sales (in thousands of dollars): 12, 15, 14, 16, 13, 15, 14, 45, 13, 16, 15, 14.
Identify any outliers using the 1.5×IQR rule.
Solution:
Step 1: Arrange the data in order. $$12, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 45$$
Step 2: Find $Q_1$ and $Q_3$.
With 12 values, the median is the average of the 6th and 7th values: $(14 + 15)/2 = 14.5$.
Lower half: 12, 13, 13, 14, 14, 14. $Q_1$ is the average of the 3rd and 4th values. $$Q_1 = \frac{13 + 14}{2} = 13.5$$
Upper half: 15, 15, 15, 16, 16, 45. $Q_3$ is the average of the 3rd and 4th values. $$Q_3 = \frac{15 + 16}{2} = 15.5$$
Step 3: Calculate the IQR. $$\text{IQR} = Q_3 - Q_1 = 15.5 - 13.5 = 2$$
Step 4: Calculate the fences. $$\text{Lower fence} = Q_1 - 1.5 \times \text{IQR} = 13.5 - 1.5(2) = 13.5 - 3 = 10.5$$ $$\text{Upper fence} = Q_3 + 1.5 \times \text{IQR} = 15.5 + 1.5(2) = 15.5 + 3 = 18.5$$
Step 5: Identify outliers. Any value below 10.5 or above 18.5 is an outlier.
Looking at our data: 12, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 45
The value 45 is an outlier because it exceeds the upper fence of 18.5.
Interpretation: Typical daily sales fall between about $12,000 and $16,000. The $45,000 day is unusually high—perhaps there was a large special order or a sale event. This outlier would significantly inflate the mean if not addressed.
Two basketball players have the following points per game over their last 8 games:
Player A: 18, 22, 19, 21, 20, 20, 19, 21 Player B: 10, 30, 15, 25, 18, 22, 28, 12
Compare the consistency of these two players.
Solution:
Player A:
Mean: $$\bar{x}_A = \frac{18 + 22 + 19 + 21 + 20 + 20 + 19 + 21}{8} = \frac{160}{8} = 20$$
Deviations and squared deviations:
| Points | Deviation | Squared |
|---|---|---|
| 18 | $-2$ | $4$ |
| 22 | $2$ | $4$ |
| 19 | $-1$ | $1$ |
| 21 | $1$ | $1$ |
| 20 | $0$ | $0$ |
| 20 | $0$ | $0$ |
| 19 | $-1$ | $1$ |
| 21 | $1$ | $1$ |
Sum of squared deviations: $4 + 4 + 1 + 1 + 0 + 0 + 1 + 1 = 12$
Variance: $s_A^2 = \frac{12}{7} \approx 1.71$
Standard deviation: $s_A = \sqrt{1.71} \approx 1.31$ points
Player B:
Mean: $$\bar{x}_B = \frac{10 + 30 + 15 + 25 + 18 + 22 + 28 + 12}{8} = \frac{160}{8} = 20$$
Deviations and squared deviations:
| Points | Deviation | Squared |
|---|---|---|
| 10 | $-10$ | $100$ |
| 30 | $10$ | $100$ |
| 15 | $-5$ | $25$ |
| 25 | $5$ | $25$ |
| 18 | $-2$ | $4$ |
| 22 | $2$ | $4$ |
| 28 | $8$ | $64$ |
| 12 | $-8$ | $64$ |
Sum of squared deviations: $100 + 100 + 25 + 25 + 4 + 4 + 64 + 64 = 386$
Variance: $s_B^2 = \frac{386}{7} \approx 55.14$
Standard deviation: $s_B = \sqrt{55.14} \approx 7.43$ points
Comparison:
| Measure | Player A | Player B |
|---|---|---|
| Mean | 20 points | 20 points |
| Standard Deviation | 1.31 points | 7.43 points |
| Range | 4 points | 20 points |
Both players average 20 points per game, but their consistency is dramatically different.
-
Player A is highly consistent—their scores rarely deviate more than 2 points from the average. You can reliably expect around 18–22 points.
-
Player B is highly variable—some games they score 10, others 30. Their standard deviation is nearly 6 times larger than Player A’s.
Conclusion: If you need a reliable scorer, Player A is the safer choice. Player B might have a higher ceiling but also a much lower floor. This is exactly what measures of spread reveal: two identical averages can hide very different stories.
A quality control manager wants to compare consistency across two different products:
- Product X (bolts): Mean diameter = 10 mm, Standard deviation = 0.2 mm
- Product Y (cables): Mean length = 500 m, Standard deviation = 5 m
Which product is manufactured more consistently relative to its size?
Solution:
We cannot directly compare 0.2 mm to 5 m—the units and scales are completely different. Instead, we use the coefficient of variation (CV), which expresses standard deviation as a percentage of the mean.
Product X: $$\text{CV}_X = \frac{s}{\bar{x}} \times 100% = \frac{0.2}{10} \times 100% = 2%$$
Product Y: $$\text{CV}_Y = \frac{s}{\bar{x}} \times 100% = \frac{5}{500} \times 100% = 1%$$
Interpretation: Product Y has a smaller coefficient of variation (1% vs. 2%), meaning it is manufactured more consistently relative to its size. Even though 5 meters sounds like a larger “error” than 0.2 mm, relative to the product’s target size, the cables are more precisely made than the bolts.
The coefficient of variation allows fair comparisons across different measurement scales.
Key Properties and Rules
Formulas for Measures of Spread
Range: $$\text{Range} = \text{Maximum} - \text{Minimum}$$
Interquartile Range: $$\text{IQR} = Q_3 - Q_1$$
Sample Variance: $$s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$$
Sample Standard Deviation: $$s = \sqrt{s^2} = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$$
Outlier Detection:
- Lower fence: $Q_1 - 1.5 \times \text{IQR}$
- Upper fence: $Q_3 + 1.5 \times \text{IQR}$
- Values outside these fences are potential outliers
Coefficient of Variation: $$\text{CV} = \frac{s}{\bar{x}} \times 100%$$
Properties of Standard Deviation
- Always non-negative: $s \geq 0$, and $s = 0$ only when all values are identical.
- Same units as data: Unlike variance, standard deviation is in the original measurement units.
- Sensitive to outliers: Like the mean, extreme values can inflate the standard deviation.
- Affected by linear transformations:
- Adding a constant to all values does not change $s$
- Multiplying all values by a constant $c$ multiplies $s$ by $|c|$
Properties of IQR
- Resistant to outliers: Extreme values do not affect $Q_1$ or $Q_3$.
- Based on position: Only uses the 25th and 75th percentile values.
- Captures middle 50%: Ignores the upper and lower 25% of data.
Comparing Measures: Quick Reference
| Situation | Best Measure of Spread |
|---|---|
| Symmetric data, no outliers | Standard deviation |
| Skewed data | IQR |
| Data with outliers | IQR |
| Need for further calculations | Standard deviation |
| Comparing different scales | Coefficient of variation |
| Quick rough estimate | Range |
The Empirical Rule (68-95-99.7)
For approximately bell-shaped (normal) distributions:
- About 68% of data falls within $\bar{x} \pm 1s$
- About 95% of data falls within $\bar{x} \pm 2s$
- About 99.7% of data falls within $\bar{x} \pm 3s$
Real-World Applications
Quality Control in Manufacturing
Manufacturing relies heavily on measures of spread. When a factory produces bolts that should be exactly 10 mm in diameter, the mean might be right at 10 mm—but that is not enough. If the standard deviation is too large, too many bolts will be too big or too small to fit properly.
Quality control uses standard deviation to set acceptable tolerance limits. A “six sigma” process—one where defects are more than 6 standard deviations from the mean—allows only about 3.4 defects per million items. This extreme consistency is what makes modern manufacturing reliable.
Weather Variability
When comparing climates, average temperature tells only part of the story. San Francisco and St. Louis might have similar average annual temperatures, but their variability is completely different. San Francisco’s temperatures stay mild year-round (small standard deviation), while St. Louis swings from freezing winters to sweltering summers (large standard deviation).
Farmers, city planners, and anyone considering where to live benefits from understanding not just average weather, but how much it varies.
Investment Risk and Volatility
In finance, standard deviation is the standard measure of risk—called volatility. Two investments might have the same average return over 10 years, but one might swing wildly (high volatility) while the other grows steadily (low volatility).
A retirement account probably should not be heavily invested in high-volatility stocks, because a big swing downward at the wrong time could be devastating. Younger investors with time to recover might accept higher volatility for potentially higher returns. Understanding spread helps investors match their risk tolerance to their investments.
Test Score Distributions
Standardized tests like the SAT and ACT are designed to have specific means and standard deviations. The SAT, for example, is designed so that the mean is around 1000-1050 with a standard deviation of about 200 points.
This standardization allows fair comparisons across years and versions. If you know the mean and standard deviation, you can use the Empirical Rule to estimate what percentage of students scored above or below any given score. A score of 1200 (about one standard deviation above the mean) puts you roughly in the top 16% of test-takers.
Medical Research
Clinical trials carefully track the spread of responses to treatments. Two medications might lower blood pressure by the same average amount, but one might work consistently while the other shows high variability—working great for some patients but barely for others. The more consistent medication is generally preferred because doctors can predict its effects more reliably.
Outlier detection also matters: if a few patients have extreme reactions, those outliers need to be investigated for safety reasons.
Self-Test Problems
Problem 1: Find the range, $Q_1$, $Q_3$, and IQR for the following data: 4, 7, 8, 12, 15, 18, 20, 22, 25.
Show Answer
Range: $25 - 4 = 21$
Quartiles: The data has 9 values. The median (5th value) is 15.
Lower half: 4, 7, 8, 12. $Q_1 = \frac{7 + 8}{2} = 7.5$
Upper half: 18, 20, 22, 25. $Q_3 = \frac{20 + 22}{2} = 21$
IQR: $Q_3 - Q_1 = 21 - 7.5 = 13.5$
Problem 2: Given the five-number summary: Min = 10, $Q_1$ = 25, Median = 40, $Q_3$ = 55, Max = 90. Use the 1.5×IQR rule to determine if the minimum and maximum are outliers.
Show Answer
IQR: $55 - 25 = 30$
Fences:
- Lower fence: $25 - 1.5(30) = 25 - 45 = -20$
- Upper fence: $55 + 1.5(30) = 55 + 45 = 100$
Analysis:
- Minimum (10) is greater than -20, so it is not an outlier
- Maximum (90) is less than 100, so it is not an outlier
Neither extreme value qualifies as an outlier by the 1.5×IQR rule.
Problem 3: Calculate the sample variance and standard deviation for the data: 5, 8, 10, 12, 15.
Show Answer
Mean: $\bar{x} = \frac{5 + 8 + 10 + 12 + 15}{5} = \frac{50}{5} = 10$
Deviations and squared deviations:
| $x_i$ | $x_i - \bar{x}$ | $(x_i - \bar{x})^2$ |
|---|---|---|
| 5 | -5 | 25 |
| 8 | -2 | 4 |
| 10 | 0 | 0 |
| 12 | 2 | 4 |
| 15 | 5 | 25 |
Sum of squared deviations: $25 + 4 + 0 + 4 + 25 = 58$
Variance: $s^2 = \frac{58}{4} = 14.5$
Standard deviation: $s = \sqrt{14.5} \approx 3.81$
Problem 4: Dataset A has mean 50 and standard deviation 5. Dataset B has mean 200 and standard deviation 10. Which dataset has greater relative variability?
Show Answer
Calculate the coefficient of variation for each:
Dataset A: $\text{CV}_A = \frac{5}{50} \times 100% = 10%$
Dataset B: $\text{CV}_B = \frac{10}{200} \times 100% = 5%$
Dataset A has greater relative variability (10% vs. 5%), even though Dataset B has a larger absolute standard deviation.
Problem 5: Using the 1.5×IQR rule, identify any outliers in the data: 3, 5, 6, 7, 8, 9, 10, 11, 12, 35.
Show Answer
Arrange and find quartiles: The data has 10 values.
Median is average of 5th and 6th values: $(8 + 9)/2 = 8.5$
Lower half: 3, 5, 6, 7, 8. $Q_1$ = 6 (middle value)
Upper half: 9, 10, 11, 12, 35. $Q_3$ = 11 (middle value)
IQR: $11 - 6 = 5$
Fences:
- Lower fence: $6 - 1.5(5) = 6 - 7.5 = -1.5$
- Upper fence: $11 + 1.5(5) = 11 + 7.5 = 18.5$
Outliers: Any value below -1.5 or above 18.5
The value 35 is an outlier (it exceeds 18.5).
Problem 6: Exam scores for a class have mean 72 and standard deviation 8. Using the Empirical Rule, estimate what percentage of students scored between 56 and 88.
Show Answer
Find how many standard deviations these boundaries are from the mean:
- $56 = 72 - 16 = 72 - 2(8)$ (2 standard deviations below the mean)
- $88 = 72 + 16 = 72 + 2(8)$ (2 standard deviations above the mean)
The range 56 to 88 is within 2 standard deviations of the mean.
By the Empirical Rule: About 95% of students scored between 56 and 88.
Problem 7: Two pizza restaurants claim an average delivery time of 30 minutes. Restaurant A has a standard deviation of 3 minutes, while Restaurant B has a standard deviation of 12 minutes. If you need your pizza within 35 minutes, which restaurant should you choose? Explain your reasoning using the Empirical Rule.
Show Answer
Restaurant A: Mean = 30, SD = 3
35 minutes is $\frac{35-30}{3} = 1.67$ standard deviations above the mean. By the Empirical Rule, about 68% of deliveries are within 1 SD (27-33 minutes), and 95% are within 2 SD (24-36 minutes). So roughly 90-95% of deliveries arrive within 35 minutes.
Restaurant B: Mean = 30, SD = 12
35 minutes is $\frac{35-30}{12} = 0.42$ standard deviations above the mean. This is less than 1 SD above the mean, so only about 50% + some additional percentage of deliveries arrive within 35 minutes—roughly 65%.
Choose Restaurant A. Its smaller standard deviation means deliveries are more consistent, and you have a much better chance of getting your pizza within 35 minutes.
Summary
-
Range (Maximum − Minimum) is the simplest measure of spread, but is highly sensitive to outliers.
-
Quartiles divide ordered data into four equal parts. $Q_1$ is the 25th percentile, $Q_2$ (median) is the 50th percentile, and $Q_3$ is the 75th percentile.
-
The five-number summary (Min, $Q_1$, Median, $Q_3$, Max) provides a quick snapshot of data distribution.
-
Interquartile Range (IQR) = $Q_3 - Q_1$ measures the spread of the middle 50% of data. It is resistant to outliers.
-
The 1.5×IQR rule identifies potential outliers: values below $Q_1 - 1.5 \times \text{IQR}$ or above $Q_3 + 1.5 \times \text{IQR}$.
-
Variance ($s^2$) measures average squared deviation from the mean: $s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$.
-
Standard deviation ($s$) is the square root of variance, giving a measure of spread in the original units. It represents the “typical” distance from the mean.
-
The Empirical Rule (68-95-99.7) helps interpret standard deviation for bell-shaped distributions: about 68% of data falls within 1 SD, 95% within 2 SD, and 99.7% within 3 SD of the mean.
-
Coefficient of variation ($\text{CV} = \frac{s}{\bar{x}} \times 100%$) allows comparison of variability across different scales.
-
Choose your measure wisely: Use standard deviation with the mean for symmetric data; use IQR with the median for skewed data or when outliers are present.
-
Two datasets can have identical centers but very different spreads—the spread tells you about consistency and reliability, which the center alone cannot reveal.