Describing Data with Numbers—Center
Find the typical value in a data set
What is “typical”? It is a surprisingly tricky question. Suppose someone asks, “What is a typical salary at this company?” You could answer with the average, but what if the CEO makes 50 times more than everyone else? That one enormous salary would pull the average way up, making it seem like everyone earns more than they actually do. Suddenly, “average” does not feel so typical anymore.
This is exactly why statisticians have developed multiple ways to describe the center of a data set. Each measure captures something different about what is “typical,” and knowing which one to use—and why—is a skill that will serve you well beyond any math class. Whether you are comparing job offers, understanding news reports about income inequality, or figuring out your grade average, these tools help you see through the numbers to the story they are really telling.
Core Concepts
Why We Need a Single Summary Number
Imagine trying to describe your city’s weather to someone who has never been there. You could list every daily temperature for the past year—365 numbers—but that would be overwhelming and not very helpful. Instead, you might say something like, “It is usually around 70 degrees.” That single number gives a quick, useful picture of what to expect.
This is exactly what measures of center do for any data set. They take a collection of values and summarize them with one representative number. But here is the catch: there are different ways to find that representative number, and each method tells a slightly different story. The three most common measures are the mean, median, and mode.
Mean: The Balance Point
The mean (often called the average) is what most people think of first. You calculate it by adding up all the values and dividing by how many values you have.
$$\bar{x} = \frac{\sum x_i}{n} = \frac{x_1 + x_2 + \cdots + x_n}{n}$$
The notation $\bar{x}$ (read as “x-bar”) represents the mean of a sample. The symbol $\sum$ means “add up all of,” and $n$ is the number of values.
Think of the mean as a balance point. If you placed all your data values as weights along a number line, the mean is where you would put the fulcrum to balance them perfectly. Every value contributes to the mean, pulling it in its direction—larger values pull it up, smaller values pull it down.
The strength of the mean: It uses every piece of data, so nothing gets ignored.
The weakness of the mean: It uses every piece of data—including extreme values that might distort the picture.
Median: The Middle Value
The median is the middle value when you arrange all your data in order from smallest to largest. It splits your data in half: 50% of values fall below the median, and 50% fall above.
Finding the median depends on whether you have an odd or even number of values:
- Odd number of values: The median is simply the middle value.
- Even number of values: The median is the average of the two middle values.
The median is remarkably stable. No matter how extreme your largest or smallest values are, the median stays right in the middle, unaffected. A billionaire moving into your neighborhood would dramatically change the mean income, but the median would barely budge.
The strength of the median: It is not thrown off by extreme values (we say it is resistant).
The weakness of the median: It ignores how far values are from the center—it only cares about position.
Mode: The Most Common Value
The mode is the value that appears most frequently. Unlike the mean and median, which can only be calculated for numerical data, the mode works for any type of data—including categories like “red,” “blue,” or “favorite pizza topping.”
A data set can have:
- One mode (unimodal): One value clearly appears most often
- Two modes (bimodal): Two values tie for most frequent
- Multiple modes (multimodal): Three or more values tie
- No mode: Every value appears the same number of times
The mode is particularly useful when you want to know what is most common or popular. “What is the most requested shoe size?” is a mode question.
The strength of the mode: It works for categorical data and tells you what is most common.
The weakness of the mode: For numerical data, it might not exist or might not be near the center at all.
Comparing Mean and Median: What Skewness Tells You
When the mean and median are close together, your data is probably fairly symmetric—values are evenly distributed around the center. But when they differ, something interesting is happening.
If the mean is greater than the median: Your data is skewed right (or positively skewed). This means there are some unusually large values pulling the mean up. Picture a long tail stretching to the right. Income data often looks like this—a few very high earners pull the mean above the median.
If the mean is less than the median: Your data is skewed left (or negatively skewed). There are some unusually small values dragging the mean down. Imagine a long tail stretching to the left.
If the mean equals (or nearly equals) the median: Your data is approximately symmetric.
Knowing how skewness affects these measures helps you choose the right one. For skewed data, the median often gives a more honest picture of “typical.”
The Effect of Outliers
An outlier is a value that is unusually far from the rest of the data. Outliers can dramatically affect the mean but leave the median largely unchanged.
Consider test scores: 78, 82, 85, 88, 90. The mean is 84.6, and the median is 85. Now suppose one student scored only 10 points (perhaps they were sick that day): 10, 78, 82, 85, 88, 90. The median is now 83.5 (average of 82 and 85)—only a small change. But the mean drops to 72.2—a dramatic shift that makes it seem like the typical student scored much lower than they actually did.
This is why income statistics, house prices, and similar data are almost always reported as medians rather than means. A few extremely wealthy individuals or luxury properties would otherwise distort the picture.
Resistant vs. Non-Resistant Measures
A resistant (or robust) measure is one that is not heavily affected by extreme values.
- The median is resistant. Change the largest value to a million, and the median stays the same.
- The mean is non-resistant. That same change to a million would wildly alter the mean.
- The mode is resistant. Extreme values do not affect which value is most common.
When you suspect your data might contain errors, outliers, or extreme values, resistant measures give you a more reliable picture.
Choosing the Right Measure of Center
There is no single “best” measure—the right choice depends on your data and what you want to communicate.
Use the mean when:
- Your data is roughly symmetric
- There are no extreme outliers
- You need a measure that accounts for every value
- You will be doing further statistical calculations
Use the median when:
- Your data is skewed
- There are outliers or extreme values
- You want a resistant measure
- You are describing incomes, house prices, or similar data
Use the mode when:
- Your data is categorical (non-numerical)
- You want to know what is most common or popular
- You are looking at discrete data with repeated values
Often, it is wise to report multiple measures. Saying “the median income is $52,000 while the mean is $78,000” tells your audience that the distribution is skewed—some people earn substantially more than most.
Weighted Mean
Sometimes not all data values are equally important. A weighted mean accounts for this by giving some values more influence than others.
$$\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i} = \frac{w_1 x_1 + w_2 x_2 + \cdots + w_n x_n}{w_1 + w_2 + \cdots + w_n}$$
The most familiar example is your GPA. A 4-credit class should count more toward your GPA than a 1-credit class. If you earned an A (4.0) in a 4-credit class and a B (3.0) in a 2-credit class:
$$\text{GPA} = \frac{(4 \times 4.0) + (2 \times 3.0)}{4 + 2} = \frac{16 + 6}{6} = \frac{22}{6} \approx 3.67$$
The 4-credit A has twice as much influence as the 2-credit B.
Weighted means appear everywhere: calculating final course grades (where exams might count more than homework), finding average stock prices (weighted by number of shares), or computing index values in finance.
Notation and Terminology
| Term | Meaning | Example |
|---|---|---|
| Mean ($\bar{x}$) | Sum of values divided by count | $\bar{x} = \frac{\sum x_i}{n}$ |
| Median | Middle value when data is ordered | 50th percentile |
| Mode | Most frequently occurring value | The “peak” of data |
| Outlier | Unusually extreme value | 150 in {10, 12, 11, 13, 150} |
| Resistant | Not affected much by outliers | Median is resistant |
| Non-resistant | Sensitive to outliers | Mean is non-resistant |
| Weighted mean | Mean accounting for different weights | GPA calculation |
| Skewed right | Tail extends toward larger values | Mean > Median |
| Skewed left | Tail extends toward smaller values | Mean < Median |
| Symmetric | Balanced distribution | Mean $\approx$ Median |
| $\sum$ (sigma) | “Sum of” | $\sum x_i$ means add all $x$ values |
| $n$ | Number of values in the data set | Sample size |
Examples
The number of hours seven students spent studying for an exam were: 3, 5, 4, 5, 6, 5, 7.
Find the mean, median, and mode.
Solution:
Mean: Add all values and divide by the count. $$\bar{x} = \frac{3 + 5 + 4 + 5 + 6 + 5 + 7}{7} = \frac{35}{7} = 5$$
The mean study time is 5 hours.
Median: First, arrange in order: 3, 4, 5, 5, 5, 6, 7.
With 7 values (odd), the median is the 4th value (the middle one). $$\text{Median} = 5$$
Mode: The value 5 appears three times, more than any other. $$\text{Mode} = 5$$
In this case, all three measures equal 5, suggesting the data is fairly symmetric and 5 hours is genuinely typical.
Part A: Find the median of: 12, 18, 14, 22, 16 (five values).
Part B: Find the median of: 12, 18, 14, 22, 16, 20 (six values).
Solution:
Part A (odd count):
Step 1: Arrange in order: 12, 14, 16, 18, 22.
Step 2: With 5 values, the median is the middle value—the 3rd value. $$\text{Median} = 16$$
Part B (even count):
Step 1: Arrange in order: 12, 14, 16, 18, 20, 22.
Step 2: With 6 values, there is no single middle value. The median is the average of the 3rd and 4th values. $$\text{Median} = \frac{16 + 18}{2} = \frac{34}{2} = 17$$
Notice that the median (17) is not actually in the data set—that is perfectly normal for an even count. It simply represents the center point between the two middle values.
A student earned the following grades:
| Course | Credits | Grade | Grade Points |
|---|---|---|---|
| Calculus | 4 | B+ | 3.3 |
| Chemistry | 4 | A- | 3.7 |
| English | 3 | B | 3.0 |
| History | 3 | A | 4.0 |
| Art | 1 | A | 4.0 |
Calculate the student’s GPA.
Solution:
The GPA is a weighted mean where the weights are the credit hours.
Step 1: Multiply each grade point value by its credit weight.
- Calculus: $4 \times 3.3 = 13.2$
- Chemistry: $4 \times 3.7 = 14.8$
- English: $3 \times 3.0 = 9.0$
- History: $3 \times 4.0 = 12.0$
- Art: $1 \times 4.0 = 4.0$
Step 2: Add the weighted values. $$13.2 + 14.8 + 9.0 + 12.0 + 4.0 = 53.0$$
Step 3: Add the total credits. $$4 + 4 + 3 + 3 + 1 = 15$$
Step 4: Divide to find the weighted mean. $$\text{GPA} = \frac{53.0}{15} \approx 3.53$$
If we had used a simple (unweighted) mean of the grade points: $(3.3 + 3.7 + 3.0 + 4.0 + 4.0)/5 = 3.6$. The weighted mean is lower because the student’s highest grades (both A’s worth 4.0) were in courses with fewer credits.
A real estate website reports the following home sale prices in a neighborhood (in thousands of dollars):
$$285, 310, 295, 340, 305, 275, 320, 1850$$
Which measure of center best represents the typical home price?
Solution:
Let us calculate all three measures.
Mean: $$\bar{x} = \frac{285 + 310 + 295 + 340 + 305 + 275 + 320 + 1850}{8} = \frac{3980}{8} = 497.5$$
The mean home price is $497,500.
Median: Arrange in order: 275, 285, 295, 305, 310, 320, 340, 1850.
With 8 values, the median is the average of the 4th and 5th values. $$\text{Median} = \frac{305 + 310}{2} = \frac{615}{2} = 307.5$$
The median home price is $307,500.
Mode: No value repeats, so there is no mode.
Analysis: The mean ($497,500) is much higher than the median ($307,500). This large difference signals that the data is skewed right—there is an outlier pulling the mean up. Indeed, the $1,850,000 luxury home is far above the others.
Best choice: The median ($307,500). It better represents what a typical home buyer would expect to pay in this neighborhood. Seven of the eight homes sold for between $275,000 and $340,000. Reporting the mean of $497,500 would mislead buyers into thinking homes are far more expensive than most actually are.
A company has 10 employees with the following annual salaries (in thousands of dollars):
$$42, 45, 48, 50, 52, 55, 58, 62, 65, 70$$
The company then hires a new CEO with a salary of $850,000.
Calculate the mean and median before and after the CEO is hired. What do you observe?
Solution:
Before the CEO is hired (10 employees):
Mean: $$\bar{x} = \frac{42 + 45 + 48 + 50 + 52 + 55 + 58 + 62 + 65 + 70}{10} = \frac{547}{10} = 54.7$$
The mean salary is $54,700.
Median: Data is already in order. With 10 values, the median is the average of the 5th and 6th values. $$\text{Median} = \frac{52 + 55}{2} = \frac{107}{2} = 53.5$$
The median salary is $53,500.
After the CEO is hired (11 employees):
The data set is now: 42, 45, 48, 50, 52, 55, 58, 62, 65, 70, 850.
Mean: $$\bar{x} = \frac{547 + 850}{11} = \frac{1397}{11} \approx 127.0$$
The mean salary is now $127,000.
Median: With 11 values (odd), the median is the 6th value. $$\text{Median} = 55$$
The median salary is now $55,000.
Comparison:
| Measure | Before CEO | After CEO | Change |
|---|---|---|---|
| Mean | $54,700 | $127,000 | +132% |
| Median | $53,500 | $55,000 | +3% |
Observation: The CEO’s salary is an outlier—it is more than 12 times larger than the next highest salary. This single value caused the mean to more than double, from $54,700 to $127,000. However, the median increased by only $1,500 (about 3%).
If someone asked “What does a typical employee at this company earn?” the mean of $127,000 would be misleading—no regular employee earns anywhere near that. The median of $55,000 gives a much more accurate picture of typical earnings.
This example demonstrates why the median is called a resistant measure: it resists being pulled by extreme values. The mean, being non-resistant, is highly sensitive to outliers.
In a college course, your final grade is calculated as follows:
- Homework: 20% of final grade
- Midterm exam: 30% of final grade
- Final exam: 50% of final grade
A student earns 92% on homework, 78% on the midterm, and 85% on the final exam. What is the student’s final grade?
Solution:
This is a weighted mean problem where the weights are the percentages.
Method 1: Using decimal weights (weights sum to 1) $$\text{Final Grade} = (0.20)(92) + (0.30)(78) + (0.50)(85)$$ $$= 18.4 + 23.4 + 42.5$$ $$= 84.3%$$
Method 2: Using the weighted mean formula with percentage weights $$\bar{x}_w = \frac{(20)(92) + (30)(78) + (50)(85)}{20 + 30 + 50}$$ $$= \frac{1840 + 2340 + 4250}{100}$$ $$= \frac{8430}{100} = 84.3%$$
The student’s final grade is 84.3%.
Note: If we had calculated a simple (unweighted) mean, we would get: $$\frac{92 + 78 + 85}{3} = \frac{255}{3} = 85%$$
The weighted mean (84.3%) is lower because the midterm, where the student performed worst, counts more than homework. The final exam score of 85% (which counts the most at 50%) brings the weighted mean close to 85, but the low midterm score (78%) weighted at 30% pulls it down below what a simple average would suggest.
Key Properties and Rules
Formulas for Measures of Center
Mean: $$\bar{x} = \frac{\sum x_i}{n} = \frac{x_1 + x_2 + \cdots + x_n}{n}$$
Weighted Mean: $$\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i}$$
Median:
- For odd $n$: The $\frac{n+1}{2}$th value when data is ordered
- For even $n$: Average of the $\frac{n}{2}$th and $\frac{n}{2}+1$th values
Mode:
- The value(s) with the highest frequency
- May not exist (no repeats) or may be multiple values
Properties of the Mean
- Uses all data values: Every observation contributes to the mean.
- Affected by outliers: Extreme values can dramatically shift the mean.
- Balance point: The sum of deviations from the mean equals zero: $\sum(x_i - \bar{x}) = 0$
- Additive: Adding a constant $c$ to every value shifts the mean by $c$.
- Multiplicative: Multiplying every value by $c$ multiplies the mean by $c$.
Properties of the Median
- Uses only position: Only the middle value(s) matter, not their magnitudes.
- Resistant to outliers: Extreme values do not affect the median.
- Divides data in half: Exactly 50% of values lie on each side.
- Minimizes absolute deviations: The median is the value that minimizes $\sum|x_i - m|$.
Properties of the Mode
- Frequency-based: Identifies the most common value.
- Works for categorical data: Unlike mean and median, can be used with non-numerical data.
- May not be unique: Data can be bimodal, multimodal, or have no mode.
- Resistant to outliers: Adding extreme values does not change which value is most common.
Comparing Measures: Quick Reference
| Situation | Best Measure |
|---|---|
| Symmetric data, no outliers | Mean |
| Skewed data | Median |
| Data with outliers | Median |
| Categorical data | Mode |
| “What is most popular?” | Mode |
| Need for further calculations | Mean |
| Income or housing data | Median |
Relationship Between Mean and Median in Skewed Data
- Right-skewed (positive skew): Mean > Median
- Left-skewed (negative skew): Mean < Median
- Symmetric: Mean $\approx$ Median
Real-World Applications
Test Scores and Academic Performance
Teachers often report both the mean and median of exam scores. The mean tells you the overall class performance, while the median tells you what a typical individual student scored. If the median is 75% but the mean is 70%, you know some students scored very low, pulling the mean down. Your GPA is calculated using weighted means, where courses with more credits count more heavily.
Income and Wealth Statistics
Income data is almost always reported as median rather than mean. Why? Because income is notoriously right-skewed. A small number of very wealthy individuals (CEOs, celebrities, tech entrepreneurs) have incomes hundreds or thousands of times larger than typical workers. Including them in a mean would make “average income” seem much higher than what most people actually earn.
When you hear that “the median household income in the US is about $75,000,” it means half of households earn more and half earn less. The mean household income is substantially higher—pulled up by the wealthy—but less useful for understanding what a typical family earns.
House Prices
Real estate listings emphasize median home prices for the same reason. A few luxury mansions or waterfront properties can dramatically inflate the mean, making an area seem more expensive than it is for typical buyers. The median gives house hunters a more realistic expectation.
Sports Statistics
Sports are filled with measures of center:
- Batting average in baseball is a mean (hits divided by at-bats)
- Points per game in basketball is a mean
- ERA (Earned Run Average) in baseball is a weighted mean, accounting for innings pitched
Commentators sometimes compare a player’s mean performance to their median to identify consistency versus variability. A basketball player who scores 20, 20, 20, 20, 20 (mean = median = 20) is more consistent than one who scores 5, 10, 15, 25, 45 (mean = 20, median = 15).
Grade Calculations
Understanding weighted means helps you strategize your studying. If your final exam is worth 40% of your grade but you have only spent 10% of your study time preparing for it, you might want to reconsider your priorities. Knowing which assignments carry more weight helps you allocate your effort wisely.
Consumer Decision Making
When reading product reviews, the average rating (mean) can be misleading if there are many extreme ratings. A product with mostly 4-star and 5-star reviews but some angry 1-star reviews might have a lower mean than a product with consistently mediocre 3-star reviews. Looking at the mode (most common rating) or the distribution of ratings often gives better insight.
Self-Test Problems
Problem 1: Find the mean, median, and mode of the following test scores: 88, 72, 95, 88, 76, 88, 82.
Show Answer
Mean: $$\bar{x} = \frac{88 + 72 + 95 + 88 + 76 + 88 + 82}{7} = \frac{589}{7} = 84.14$$
Median: Arrange in order: 72, 76, 82, 88, 88, 88, 95. With 7 values, the median is the 4th value. $$\text{Median} = 88$$
Mode: The value 88 appears 3 times (more than any other). $$\text{Mode} = 88$$
Problem 2: The ages of 6 participants in a workshop are: 22, 25, 28, 30, 33, 40. Find the median age.
Show Answer
The data is already in order: 22, 25, 28, 30, 33, 40.
With 6 values (even), the median is the average of the 3rd and 4th values. $$\text{Median} = \frac{28 + 30}{2} = \frac{58}{2} = 29$$
The median age is 29 years.
Problem 3: A student’s final grade is calculated as: Quizzes (15%), Homework (25%), Midterm (25%), Final Exam (35%). The student earned: Quizzes 90%, Homework 85%, Midterm 72%, Final Exam 80%. What is the final grade?
Show Answer
Using the weighted mean formula: $$\text{Final Grade} = (0.15)(90) + (0.25)(85) + (0.25)(72) + (0.35)(80)$$ $$= 13.5 + 21.25 + 18.0 + 28.0$$ $$= 80.75%$$
The student’s final grade is 80.75%.
Problem 4: A data set has values: 5, 8, 12, 15, 18, 22, 250. Calculate the mean and median. Which measure better represents the “typical” value? Why?
Show Answer
Mean: $$\bar{x} = \frac{5 + 8 + 12 + 15 + 18 + 22 + 250}{7} = \frac{330}{7} \approx 47.1$$
Median: Data is already in order. With 7 values, the median is the 4th value. $$\text{Median} = 15$$
Which is better? The median (15) better represents the typical value.
The value 250 is an outlier—it is more than 10 times larger than most other values. This extreme value pulls the mean up to 47.1, which is higher than 6 of the 7 values. The median of 15 is right in the middle of the six “regular” values and is not affected by the outlier.
Problem 5: Explain why news reports about income typically use median rather than mean.
Show Answer
News reports use median income rather than mean income because income data is heavily right-skewed—a small percentage of people earn extremely high incomes (millions or billions of dollars) while most people earn relatively modest amounts.
These extremely high incomes would pull the mean up dramatically, making “average income” seem much higher than what typical people actually earn. For example, if 99 people earn $50,000 and 1 person earns $10,000,000, the mean income is $149,500—almost triple what 99% of people actually earn.
The median, being resistant to outliers, would remain close to $50,000, accurately reflecting what a typical person earns. Half of people earn more than the median, and half earn less—regardless of how extreme the highest or lowest incomes are.
Problem 6: A class of 20 students takes a quiz. The scores are: 65, 70, 70, 72, 75, 75, 75, 78, 80, 80, 82, 82, 85, 85, 85, 88, 90, 92, 95, 98. Find the mean, median, and mode. Is the data skewed? If so, in which direction?
Show Answer
Mean: $$\bar{x} = \frac{65+70+70+72+75+75+75+78+80+80+82+82+85+85+85+88+90+92+95+98}{20}$$ $$= \frac{1622}{20} = 81.1$$
Median: With 20 values (even), the median is the average of the 10th and 11th values. Looking at the ordered data, the 10th value is 80 and the 11th value is 82. $$\text{Median} = \frac{80 + 82}{2} = 81$$
Mode: The values 75 and 85 each appear 3 times. $$\text{Mode} = 75 \text{ and } 85 \text{ (bimodal)}$$
Skewness: The mean (81.1) is very close to the median (81), which suggests the data is approximately symmetric. Looking at the range of scores (65 to 98) and how they are distributed, we see they spread fairly evenly on both sides of the center. There is no strong skewness.
Problem 7: An investor owns the following stocks:
| Stock | Shares | Price per Share |
|---|---|---|
| A | 100 | $25 |
| B | 50 | $40 |
| C | 200 | $15 |
What is the weighted average price per share of the investor’s portfolio?
Show Answer
The weighted average price uses the number of shares as weights:
$$\bar{x}_w = \frac{(100)(25) + (50)(40) + (200)(15)}{100 + 50 + 200}$$ $$= \frac{2500 + 2000 + 3000}{350}$$ $$= \frac{7500}{350}$$ $$\approx 21.43$$
The weighted average price is $21.43 per share.
Note: A simple average of the prices would be $(25 + 40 + 15)/3 = 26.67$, which does not account for the fact that the investor owns many more shares of the cheaper stocks.
Summary
-
Measures of center summarize a data set with a single “typical” value. The three main measures are mean, median, and mode.
-
Mean ($\bar{x}$) is calculated by adding all values and dividing by the count. It uses every data point but is sensitive to outliers.
-
Median is the middle value when data is ordered. For an even number of values, average the two middle values. It is resistant to outliers and preferred for skewed data.
-
Mode is the most frequently occurring value. It works for categorical data and can identify what is most common.
-
When mean > median, data is skewed right (positive skew). When mean < median, data is skewed left (negative skew). When they are equal, data is approximately symmetric.
-
Outliers are extreme values that can dramatically affect the mean but leave the median largely unchanged.
-
Resistant measures (median, mode) are not heavily influenced by outliers. Non-resistant measures (mean) are sensitive to extreme values.
-
Weighted mean gives different values different levels of importance, useful for GPA calculations, portfolio values, and grade calculations.
-
Choose your measure wisely: use mean for symmetric data, median for skewed data or when outliers exist, and mode for categorical data or finding the most popular value.
-
In real-world contexts like income and housing prices, the median typically provides a more honest picture of “typical” than the mean.