Visualizing Data

Turn numbers into pictures that reveal patterns

A spreadsheet with 10,000 rows of numbers is almost useless to the human eye. You could stare at those columns for hours and never notice that sales spike every December, or that most customers fall into two distinct age groups, or that one data point is wildly different from all the others. But turn those same numbers into the right picture, and patterns leap off the page.

This is why data visualization matters. Your brain is remarkably good at processing visual information—detecting trends, spotting outliers, comparing groups. It is far less good at scanning endless rows of digits. A well-chosen graph transforms data from an overwhelming wall of numbers into a story you can actually see and understand.

In this chapter, you will learn to create and interpret the most important types of statistical graphs. More importantly, you will learn to read them critically—because just as the right visualization reveals truth, the wrong one (intentionally or not) can deceive.

Core Concepts

Why Visualize Data?

Before diving into specific graph types, let’s understand what visualization actually does for us.

Humans are pattern-detection machines—but only for visual patterns. Show someone a table of 50 numbers and ask if there is a trend. They will struggle. Show them a line graph of those same 50 numbers, and the trend (if there is one) becomes immediately obvious. Our visual cortex evolved to detect movement, shape, and spatial relationships. Data visualization hijacks this powerful machinery to help us understand abstract numbers.

Good visualizations reveal what statistics alone cannot. Consider a dataset where the mean is 50 and the standard deviation is 15. That sounds like useful information until you realize that dozens of completely different distributions could produce those same numbers. A visualization shows you the actual shape—is it symmetric? Skewed? Does it have one peak or two? Are there outliers?

The right graph depends on what you want to show. Different visualizations answer different questions. Comparing categories? Use a bar chart. Showing change over time? Use a line graph. Displaying the distribution of a single variable? Use a histogram or box plot. Showing the relationship between two variables? Use a scatter plot. Choosing the wrong graph is like using a hammer when you need a screwdriver—you might make progress, but you are working against yourself.

Dot Plots: The Simplest Visualization

A dot plot is the most basic way to display data. You draw a number line, and for each data point, you place a dot above that value. If a value appears more than once, you stack the dots.

Dot plots work well when you have:

  • A small to moderate dataset (roughly 5–30 values)
  • Data that falls into a limited range of values
  • A desire to see every individual data point

How to read a dot plot: Look for clusters (where dots bunch together), gaps (where no dots appear), and potential outliers (dots far from the main group). The height of stacked dots shows frequency—how often each value occurs.

Limitations: Dot plots become cluttered with large datasets and do not work well when values are spread over a wide range with many unique values.

Histograms: Seeing the Shape of Data

A histogram looks similar to a bar chart, but it serves a fundamentally different purpose. While bar charts compare separate categories, histograms show how numerical data is distributed across a continuous range.

In a histogram, the range of data values is divided into intervals called bins (or classes). Each bar’s height represents the frequency (count) or relative frequency (proportion) of values falling within that bin. The bars touch each other because the data is continuous—there are no gaps between intervals.

Key parts of a histogram:

  • Horizontal axis: The range of data values, divided into bins
  • Vertical axis: Frequency (count) or relative frequency (proportion)
  • Bars: Each bar spans one bin, and its height shows how many values fall in that range

Choosing bin width: This is more art than science. Too few bins (too wide) obscure important details. Too many bins (too narrow) create a choppy, hard-to-read graph. A common rule of thumb is to use between 5 and 15 bins, but the best choice depends on your data and what patterns you want to reveal.

What histograms reveal:

  • The shape of the distribution (symmetric, skewed, uniform, etc.)
  • Where the data is centered
  • How much the data spreads
  • Whether there are outliers or gaps
  • Whether there is one peak or several

Describing Distributions: Shape, Center, Spread, and Outliers

When you look at a histogram or other distribution plot, you should systematically describe four aspects: shape, center, spread, and any unusual features.

Shape

Symmetric distributions look like mirror images on either side of the center. If you folded the histogram in half at the middle, the two sides would roughly match. The classic bell curve (normal distribution) is symmetric.

Skewed distributions have a long tail stretching in one direction:

  • Skewed right (positively skewed): The tail extends toward higher values. Most data points cluster on the left, with a few high values stretching the distribution to the right. Example: Income distribution—most people earn moderate amounts, but a few earn extremely high incomes.
  • Skewed left (negatively skewed): The tail extends toward lower values. Most data points cluster on the right, with a few low values stretching the distribution to the left. Example: Age at retirement—most people retire around 60–65, but some retire much earlier.

How to remember skew direction: The name describes where the tail goes, not where most of the data is. “Skewed right” means the tail points right.

Number of Peaks (Modality)

  • Unimodal: One clear peak—the most common pattern
  • Bimodal: Two distinct peaks—often suggests two different groups in the data
  • Multimodal: Three or more peaks
  • Uniform: No peaks—all values occur with roughly equal frequency

Center

Where is the “middle” of the distribution? For symmetric distributions, the mean and median are close together near the center. For skewed distributions, the median is usually more representative of “typical” because the mean gets pulled toward the tail.

Spread

How wide is the distribution? Are values tightly clustered or spread out? You can estimate this visually by looking at the range covered by the data and how concentrated the bars are around the center.

Outliers and Unusual Features

Are there isolated bars far from the main distribution? Are there gaps in the data? Any unusual features deserve attention and explanation.

Stem-and-Leaf Plots: Data and Display Combined

A stem-and-leaf plot is a clever compromise between a table of raw data and a histogram. It shows the distribution’s shape while preserving the actual data values.

Each number is split into a stem (all digits except the last) and a leaf (the final digit). Numbers with the same stem are grouped on the same row.

For example, the number 47 has stem 4 and leaf 7. The number 53 has stem 5 and leaf 3.

How to read a stem-and-leaf plot:

  • Read each complete number by combining the stem with each leaf
  • Look at the shape by turning your head sideways—the leaves form a sort of sideways histogram
  • Every original data point is preserved

Advantages:

  • Shows exact values, not just frequencies
  • Shows the distribution shape
  • Easy to find median, quartiles, and other statistics from the sorted data

Limitations:

  • Best for datasets of roughly 15–50 values
  • Works well only when data values have 2–3 digits

Box Plots (Box-and-Whisker Plots)

A box plot is a powerful visualization built from the five-number summary (minimum, $Q_1$, median, $Q_3$, maximum). It shows the distribution’s center, spread, and potential outliers in a compact form.

Components of a box plot:

  • The box: Spans from $Q_1$ to $Q_3$, capturing the middle 50% of the data (the IQR)
  • The line inside the box: Marks the median
  • The whiskers: Lines extending from the box toward the minimum and maximum values (but see note about outliers below)
  • Outlier points: Individual dots beyond the whiskers for values flagged as outliers

How outliers are handled: In most box plots, the whiskers do not extend to the absolute minimum and maximum. Instead, they extend to the most extreme values that are not outliers (using the 1.5×IQR rule). Values beyond this range are plotted as individual points.

What box plots show well:

  • The center (median line)
  • The spread of the middle 50% (box width)
  • The overall spread (whisker length)
  • Skewness (asymmetric box or whiskers)
  • Outliers (individual points)

What box plots hide:

  • The exact shape of the distribution (you cannot tell if it is bimodal)
  • The sample size
  • Clusters or gaps within the data

Box plots are especially valuable when comparing multiple groups side by side—something that would be cluttered with multiple histograms.

Comparing Distributions with Side-by-Side Plots

One of visualization’s greatest strengths is comparison. When you want to compare two or more groups, placing graphs side by side reveals differences and similarities at a glance.

Side-by-side box plots are the most common choice for comparing distributions. They use the same scale, so you can directly compare:

  • Centers (which group has a higher median?)
  • Spreads (which group is more variable?)
  • Outliers (does one group have more extreme values?)
  • Skewness (are the distributions symmetric or tilted?)

Back-to-back stem-and-leaf plots display two distributions sharing a common stem column, with leaves extending in opposite directions. This preserves all the original data while allowing direct comparison.

Multiple histograms can be stacked or overlaid (using different colors), though this can become cluttered with more than two or three groups.

Misleading Graphs: How Visuals Can Deceive

Data visualization is a powerful tool—and like any powerful tool, it can be misused. Whether through ignorance or intent, graphs can distort the truth. Learning to spot these tricks makes you a more critical consumer of information.

Truncated Axes

The most common deception is starting the vertical axis at a number other than zero. This can make small differences look enormous.

Imagine two candidates in a poll: Candidate A has 52% support, Candidate B has 48%. On an honest graph starting at zero, the bars look almost identical—because the difference is small. But if the axis starts at 45%, suddenly Candidate A’s bar looks twice as tall as Candidate B’s. The visual impression suggests a landslide when the reality is a close race.

How to spot it: Always check where the axes start. Be suspicious when small numerical differences create large visual differences.

Manipulating Aspect Ratio

Stretching a graph vertically makes increases look more dramatic. Stretching it horizontally flattens the same increases. A stock chart showing “steady growth” might actually show “explosive growth”—or vice versa—depending on how it is stretched.

How to spot it: Ask yourself whether the visual impression matches the actual numbers. Does a 10% increase look like a 10% increase, or does it look like the chart is skyrocketing?

3D Effects and Pictographs

Three-dimensional bar charts look fancy but distort perception. Bars in the “back” look smaller due to perspective, even if they represent the same value as bars in the “front.”

Pictographs (using pictures instead of bars) create similar problems. If you double a picture’s height and width to show “twice as much,” the area actually quadruples—making the increase look far larger than it is.

How to spot it: Prefer simple 2D graphs. If a graph looks complex or artistic, examine the numbers carefully.

Cherry-Picked Time Periods

A stock that is “up 50% this year” sounds impressive—until you learn it crashed 60% last year and still has not recovered. By choosing the right start and end dates, nearly any trend can be shown.

How to spot it: Ask what the full picture looks like. Does the selected time period make sense, or does it seem chosen to support a particular narrative?

Unlabeled or Confusing Axes

Graphs without clear labels are inherently suspicious. What does the vertical axis measure? What are the units? Without this information, the graph is meaningless at best and deceptive at worst.

How to spot it: If you do not immediately understand what a graph is showing, do not trust conclusions drawn from it.

Notation and Terminology

Term Meaning Example
Histogram Bar chart for numerical data where bars represent frequency within bins Heights grouped into 5 cm intervals
Bin (Class) An interval in a histogram 160–165 cm
Distribution The overall pattern of data Shape, center, spread
Symmetric Mirror image on both sides Normal (bell curve)
Skewed right Long tail extending to higher values Income distribution
Skewed left Long tail extending to lower values Age at retirement
Unimodal Distribution with one peak Most exam score distributions
Bimodal Distribution with two peaks Heights of adult men and women combined
Box plot Visual display of the five-number summary Shows quartiles and outliers
Whiskers Lines extending from the box to extreme non-outlier values
Stem-and-leaf plot Display showing distribution shape while preserving exact values
Dot plot Simple display with dots stacked above a number line

Examples

Example 1: Creating a Dot Plot

A teacher recorded the number of books read by 15 students over summer break: 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 8, 10.

Create a dot plot and describe what you observe.

Solution:

Draw a number line from 2 to 10. For each data value, place a dot above that number. Stack dots for repeated values.

           *
        *  *
     *  *  *
     *  *  *  *
  *  *  *  *  *  *     *     *
  |--|--|--|--|--|--|--|--|--|--
  2  3  4  5  6  7  8  9  10

Observations:

  • Center: The data clusters around 4–6 books, with 5 being the most common value (mode).
  • Spread: Values range from 2 to 10, a spread of 8 books.
  • Shape: Roughly symmetric, perhaps slightly skewed right due to the value at 10.
  • Outliers: The value 10 is separated from the main cluster and could be a mild outlier.

The dot plot shows that most students read 3–6 books, with one enthusiastic reader at 10.

Example 2: Describing the Shape of a Histogram

A histogram displays the ages of 200 marathon finishers. The bars show:

  • Ages 18–25: height 15
  • Ages 25–32: height 35
  • Ages 32–39: height 55
  • Ages 39–46: height 45
  • Ages 46–53: height 30
  • Ages 53–60: height 15
  • Ages 60–67: height 5

Describe the distribution’s shape, center, and spread.

Solution:

Shape: The distribution is unimodal (one clear peak) and approximately symmetric, though with a slight skew to the right. The peak occurs in the 32–39 age range, with frequencies decreasing on both sides. The right tail (older runners) extends slightly further than the left tail, creating mild right skew.

Center: The peak is in the 32–39 bin, suggesting the center (median and mean) is around the mid-30s. Most marathon runners appear to be between their late 20s and late 40s.

Spread: Ages range from 18 to 67, a span of nearly 50 years. However, the bulk of runners (the middle 50% or so) appear to fall between roughly 25 and 53.

Summary: Marathon finishers tend to be in their 30s and 40s, with participation dropping off for both younger and older age groups. The distribution is roughly bell-shaped.

Example 3: Constructing a Box Plot from the Five-Number Summary

The five-number summary for test scores in a class is:

  • Minimum: 52
  • $Q_1$: 68
  • Median: 78
  • $Q_3$: 85
  • Maximum: 98

Construct the box plot and describe what it reveals.

Solution:

Step 1: Draw the scale. Create a number line that spans from at least 52 to 98. Using 50 to 100 with marks every 10 points works well.

Step 2: Draw the box. Draw a rectangle from $Q_1 = 68$ to $Q_3 = 85$.

Step 3: Mark the median. Draw a vertical line inside the box at the median value of 78.

Step 4: Check for outliers using the 1.5×IQR rule. $$\text{IQR} = Q_3 - Q_1 = 85 - 68 = 17$$ $$\text{Lower fence} = 68 - 1.5(17) = 68 - 25.5 = 42.5$$ $$\text{Upper fence} = 85 + 1.5(17) = 85 + 25.5 = 110.5$$

The minimum (52) is above the lower fence (42.5), and the maximum (98) is below the upper fence (110.5), so there are no outliers.

Step 5: Draw the whiskers. Extend lines from the box to the minimum (52) and maximum (98).

         52        68    78   85        98
          |--------|=====|====|---------|
    |-----|-----|-----|-----|-----|-----|-----|
   50    60    70    80    90   100

Interpretation:

  • The median (78) is closer to $Q_3$ than to $Q_1$, and the left whisker is longer than the right whisker. This suggests slight left skew—a few students scored noticeably lower than the bulk of the class.
  • The IQR of 17 points shows moderate spread in the middle 50% of scores.
  • No outliers suggest consistent performance without extreme scores.
  • Overall, this appears to be a reasonably successful test for the class, with most scores in the C to A range.
Example 4: Comparing Two Distributions Using Side-by-Side Box Plots

Two sections of a statistics course took the same final exam. Their five-number summaries are:

Section A: Min = 58, $Q_1$ = 72, Median = 81, $Q_3$ = 88, Max = 97

Section B: Min = 45, $Q_1$ = 65, Median = 75, $Q_3$ = 82, Max = 100

Compare the two sections’ performance.

Solution:

First, calculate key measures:

Section A Section B
IQR $88 - 72 = 16$ $82 - 65 = 17$
Range $97 - 58 = 39$ $100 - 45 = 55$
Median 81 75

Comparison of Centers: Section A’s median (81) is 6 points higher than Section B’s median (75). The typical student in Section A scored higher than the typical student in Section B.

Comparison of Spread:

  • The IQRs are similar (16 vs. 17), meaning the middle 50% of each section has similar variability.
  • However, Section B has a much larger range (55 vs. 39), indicating more extreme scores at both ends.

Box Plot Sketch:

Section A:        58|-----[===|====]-----|97

Section B:  45|----------[====|======]---------|100
            |-----|-----|-----|-----|-----|-----|
           40    50    60    70    80    90   100

Shape Analysis:

  • Section A: The median (81) is closer to $Q_3$ (88) than $Q_1$ (72), suggesting slight left skew.
  • Section B: Similar pattern with the median closer to $Q_3$.

Conclusions:

  1. Section A outperformed Section B overall (higher median).
  2. Section B had both the lowest score (45) and the highest score (100).
  3. Section A was more consistent—its range is smaller.
  4. Both sections have similar “middle 50%” spreads.

The instructor might investigate why Section B had more variability—perhaps different preparation levels or engagement within that section.

Example 5: Identifying How a Graph Is Misleading

A news report shows a bar graph comparing average teacher salaries in two neighboring states:

  • State X: $52,000
  • State Y: $48,000

The graph’s vertical axis starts at $45,000 and ends at $55,000. The bar for State X appears roughly 2.5 times taller than the bar for State Y. The headline reads: “State X Teachers Earn Dramatically More!”

Identify how this graph is misleading and explain the deception.

Solution:

The Deception: Truncated Vertical Axis

The vertical axis starts at $45,000 instead of $0. This creates a false visual impression.

The actual numbers:

  • State X: $52,000
  • State Y: $48,000
  • Difference: $4,000

The actual percentage difference: $$\frac{52{,}000 - 48{,}000}{48{,}000} \times 100% = \frac{4{,}000}{48{,}000} \times 100% \approx 8.3%$$

State X teachers earn about 8% more—a modest difference, not “dramatic.”

Why the visual is misleading:

On the truncated axis:

  • State X’s bar spans from $45,000 to $52,000 = 7 units of height
  • State Y’s bar spans from $45,000 to $48,000 = 3 units of height
  • Visual ratio: State X looks $\frac{7}{3} \approx 2.3$ times taller

On an honest axis starting at $0:

  • State X’s bar would span 0 to $52,000 = 52 units
  • State Y’s bar would span 0 to $48,000 = 48 units
  • Visual ratio: State X would look $\frac{52}{48} \approx 1.08$ times taller

The truncated axis exaggerates an 8% difference to look like a 130% difference visually.

The corrected picture:

With the axis starting at zero, both bars would be nearly the same height, with State X’s bar just slightly taller. This accurately represents a small difference, not a “dramatic” one.

Red flags to watch for:

  1. Axes that do not start at zero
  2. Headlines claiming dramatic differences when actual numbers are close
  3. Large visual differences that do not match the numerical differences
  4. Missing or unclear axis labels

The takeaway: Always check the axis scale. If the visual impression seems more dramatic than the numbers justify, the graph may be designed to deceive.

Example 6: Creating and Interpreting a Stem-and-Leaf Plot

The following data shows the ages of 25 attendees at a community health seminar:

22, 24, 28, 31, 34, 35, 37, 38, 42, 43, 45, 47, 48, 51, 52, 53, 55, 56, 58, 61, 63, 65, 68, 72, 78

Create a stem-and-leaf plot and describe the distribution.

Solution:

Step 1: Identify stems and leaves. For two-digit numbers, the tens digit is the stem and the units digit is the leaf.

Step 2: Organize the data.

Stem Leaves
2 2 4 8
3 1 4 5 7 8
4 2 3 5 7 8
5 1 2 3 5 6 8
6 1 3 5 8
7 2 8

Key: 3|4 means 34 years old

Step 3: Interpret the distribution.

Shape: The distribution is roughly unimodal and symmetric, perhaps with a very slight right skew due to the two values in the 70s. The peak is in the 50s (the row with the most leaves).

Center: With 25 values, the median is the 13th value. Counting through the leaves: 2s row has 3 values (positions 1–3), 3s row has 5 values (positions 4–8), 4s row has 5 values (positions 9–13). The 13th value is the last in the 4s row: 48 years old.

Spread: Ages range from 22 to 78, a span of 56 years.

Distribution details visible from the stem-and-leaf:

  • No one in their teens attended
  • Three people in their 20s
  • Five each in their 30s and 40s
  • Six in their 50s (the most common decade)
  • Four in their 60s
  • Two in their 70s

Advantage of the stem-and-leaf: We can see the exact ages, not just counts. For instance, we know the 20-somethings were specifically 22, 24, and 28—not evenly spread across the decade.

Conclusion: The seminar attracted a middle-aged to older adult audience, with the typical attendee in their late 40s to 50s. Younger adults (under 30) were underrepresented.

Key Properties and Rules

Choosing the Right Visualization

If you want to… Use…
Show the distribution of one numerical variable Histogram, box plot, or dot plot
Preserve exact data values while showing shape Stem-and-leaf plot
Compare distributions across groups Side-by-side box plots
Highlight the five-number summary Box plot
Show individual data points (small dataset) Dot plot
Identify the shape (symmetric, skewed, bimodal) Histogram

Describing Distributions: A Checklist

When describing any distribution, address:

  1. Shape: Symmetric or skewed? Which direction? How many peaks?
  2. Center: Where is the “typical” value? (Estimate the median or mean)
  3. Spread: How much do values vary? (Range, IQR, or standard deviation)
  4. Unusual features: Outliers? Gaps? Clusters?

Reading Box Plots

  • Box width (IQR): Represents spread of middle 50%
  • Median line position: If centered, distribution is roughly symmetric; if off-center, distribution is skewed
  • Whisker lengths: Long whisker indicates tail in that direction
  • Individual points: Outliers (values beyond 1.5 × IQR from quartiles)

Red Flags for Misleading Graphs

Be skeptical when you see:

  1. Axis not starting at zero (especially for bar charts)
  2. Unusual aspect ratios that flatten or stretch trends
  3. 3D effects that distort bar heights
  4. Missing labels on axes
  5. Cherry-picked time periods in trend data
  6. Pictographs where symbol sizes do not match values proportionally
  7. Dual y-axes that can be scaled to show any correlation

Box Plot Construction Steps

  1. Find the five-number summary (Min, $Q_1$, Median, $Q_3$, Max)
  2. Calculate IQR = $Q_3 - Q_1$
  3. Find fences: Lower = $Q_1 - 1.5 \times \text{IQR}$, Upper = $Q_3 + 1.5 \times \text{IQR}$
  4. Identify outliers (values beyond fences)
  5. Draw box from $Q_1$ to $Q_3$ with line at median
  6. Draw whiskers to the most extreme non-outlier values
  7. Plot outliers as individual points

Real-World Applications

News Media and Infographics

Every day, news outlets present data through visualizations. Election result maps, COVID-19 case charts, economic indicators, climate change graphs—all rely on the principles in this chapter. Being able to read these critically is essential for informed citizenship.

Unfortunately, misleading graphs are common in partisan media. The same data can be visualized to support opposite narratives by manipulating scales, cherry-picking time frames, or choosing the “right” chart type. Understanding visualization techniques helps you see through spin and find the actual story in the data.

Business Dashboards

Modern businesses run on data dashboards—collections of graphs showing key performance indicators. A sales manager might see box plots comparing revenue by region, histograms showing customer order sizes, and trend lines showing monthly growth.

Choosing the right visualization matters for decision-making. A histogram might reveal that most orders are small with a few very large ones, suggesting different strategies for different customer segments. A box plot comparison might show that one region has higher median sales but also more outliers, requiring different management approaches.

Scientific Data Presentation

Scientific papers rely heavily on data visualization. Histograms show the distribution of experimental results. Box plots compare control and treatment groups. Careful visualization choices ensure that readers can verify the researchers’ conclusions.

Scientists follow strict conventions to avoid misleading readers. Error bars show uncertainty. Axes are clearly labeled. Sample sizes are reported. This transparency is part of what makes scientific communication trustworthy—and why you should be suspicious of graphs that lack these features.

Election and Polling Data

During elections, you will encounter countless visualizations of polling data. Understanding box plots helps you interpret the “margin of error” often reported with polls. Recognizing misleading graphs helps you identify biased reporting.

A responsible poll visualization shows not just the point estimate (e.g., “Candidate A: 48%”) but also the uncertainty around it. Overlapping ranges between candidates mean the race is truly competitive, even if one candidate’s central estimate is higher. Misleading visualizations might truncate axes to exaggerate small differences or present margins of error in visually deceptive ways.

Public Health Communication

Health officials use visualizations to communicate with the public. During disease outbreaks, curves showing case counts over time help people understand trends. Comparing these curves across regions or time periods reveals which interventions worked.

The phrase “flatten the curve” during the COVID-19 pandemic was fundamentally about visualization—comparing two possible histogram-like distributions of cases over time, one tall and narrow (overwhelming hospitals) and one short and wide (manageable). Understanding these visualizations helps you make informed decisions about your own health and community.

Self-Test Problems

Problem 1: The following data shows quiz scores (out of 10) for a class: 6, 7, 7, 8, 8, 8, 8, 9, 9, 10. Create a dot plot and describe the distribution’s shape.

Show Answer

Dot Plot:

                 *
              *  *
        *  *  *  *  *
     *  *  *  *  *  *  *
     |--|--|--|--|--|--|--|
     6  7  8  9  10

Description: The distribution is unimodal with a peak at 8 (four students). It is slightly skewed left because there are more values on the high end (9, 10) than spread below the mode. The center appears to be around 8, and the spread covers scores from 6 to 10 (range of 4 points). There are no gaps or obvious outliers.

Problem 2: A histogram of exam scores shows the following pattern: a tall bar at 90–100, a moderate bar at 80–90, small bars at 70–80 and 60–70, and tiny bars at lower scores. Describe the shape of this distribution.

Show Answer

The distribution is skewed left (negatively skewed). Most students scored in the highest range (90–100), with progressively fewer students at each lower score range. The long tail extends toward the lower scores.

This is unimodal with the single peak in the 90–100 bin.

This pattern suggests the exam was relatively easy for most students, or the class was well-prepared. Alternatively, it might indicate grade inflation or a test that did not effectively differentiate among students.

Problem 3: Given the five-number summary: Min = 15, $Q_1$ = 28, Median = 35, $Q_3$ = 44, Max = 92. Construct a box plot and identify any outliers.

Show Answer

Step 1: Calculate IQR and fences. $$\text{IQR} = Q_3 - Q_1 = 44 - 28 = 16$$ $$\text{Lower fence} = 28 - 1.5(16) = 28 - 24 = 4$$ $$\text{Upper fence} = 44 + 1.5(16) = 44 + 24 = 68$$

Step 2: Identify outliers.

  • Minimum (15) is above the lower fence (4): not an outlier
  • Maximum (92) is above the upper fence (68): outlier

Step 3: Find whisker endpoints.

  • Lower whisker extends to the minimum (15)
  • Upper whisker extends to the largest value that is not an outlier. Since 92 is an outlier, the upper whisker would extend to the largest non-outlier value. Without knowing other data points, we assume the whisker extends to 68 (the fence) or the next available non-outlier value below 92.

Box plot description:

  • Box from 28 to 44
  • Median line at 35
  • Lower whisker from 28 down to 15
  • Upper whisker from 44 up to 68 (or the largest non-outlier)
  • Individual point plotted at 92 (outlier)

Interpretation: The distribution is skewed right, indicated by the outlier on the high end and the upper whisker being longer than the lower. The median (35) is closer to $Q_1$ (28) than to $Q_3$ (44), also suggesting right skew.

Problem 4: Two datasets have the following box plot characteristics:

Dataset A: Min = 20, $Q_1$ = 35, Median = 50, $Q_3$ = 65, Max = 80

Dataset B: Min = 30, $Q_1$ = 45, Median = 50, $Q_3$ = 55, Max = 70

Which dataset has greater spread? Which has higher typical values? Explain.

Show Answer

Spread Comparison:

  • Dataset A: IQR = $65 - 35 = 30$, Range = $80 - 20 = 60$
  • Dataset B: IQR = $55 - 45 = 10$, Range = $70 - 30 = 40$

Dataset A has greater spread by both measures. Its IQR is three times larger, and its range is 50% larger.

Typical Values: Both datasets have the same median (50), so their “typical” values are equal in that sense.

However, looking at the middle 50%:

  • Dataset A’s middle 50% spans 35–65
  • Dataset B’s middle 50% spans 45–55

Dataset B’s values are more concentrated near the center. A “typical” value from Dataset B is more likely to be close to 50, while Dataset A’s values are more spread out.

Visual difference: Dataset A would show a wide box with short whiskers. Dataset B would show a narrow box indicating tight clustering around the median. Despite the same median, these distributions tell very different stories.

Problem 5: A political advertisement shows a bar graph comparing the unemployment rate under two presidents. President A: 5.2%, President B: 4.8%. The graph’s vertical axis runs from 4.5% to 5.5%. The bar for President A appears about 4 times taller than President B’s bar. Explain why this graph is misleading and how it should be corrected.

Show Answer

Why it is misleading:

The axis starts at 4.5% instead of 0%. This creates a visual exaggeration:

On the truncated axis (4.5% to 5.5%):

  • President A’s bar: rises 0.7 units (from 4.5 to 5.2)
  • President B’s bar: rises 0.3 units (from 4.5 to 4.8)
  • Visual ratio: $\frac{0.7}{0.3} \approx 2.3$ times taller (though the problem states 4 times, illustrating how axis manipulation works)

On an honest axis (0% to ~6%):

  • President A’s bar: 5.2 units tall
  • President B’s bar: 4.8 units tall
  • Visual ratio: $\frac{5.2}{4.8} \approx 1.08$ times taller

The actual difference: $$\frac{5.2 - 4.8}{4.8} \times 100% = \frac{0.4}{4.8} \times 100% \approx 8.3%$$

An 8% difference, while politically meaningful, is not dramatic. Both unemployment rates are quite similar.

How to correct it:

  1. Start the vertical axis at 0%
  2. Use a scale that shows the full context (perhaps 0% to 10%)
  3. Clearly label the axis
  4. Avoid 3D effects

With proper scaling, viewers would correctly perceive that unemployment was nearly identical under both presidents, differing by less than half a percentage point.

Problem 6: Create a stem-and-leaf plot for the following test scores: 67, 72, 75, 78, 78, 81, 83, 84, 85, 85, 86, 88, 91, 93, 95. Then find the median and describe the distribution.

Show Answer

Stem-and-Leaf Plot:

Stem Leaves
6 7
7 2 5 8 8
8 1 3 4 5 5 6 8
9 1 3 5

Key: 7|2 means 72

Finding the median: There are 15 values, so the median is the 8th value. Counting: 6|7 is 1st; 7|2,5,8,8 are 2nd–5th; 8|1,3,4 are 6th–8th. The 8th value is 84.

Distribution description:

  • Shape: Unimodal with the peak in the 80s (7 values). Roughly symmetric, perhaps very slightly skewed left due to the single value in the 60s.
  • Center: Median is 84.
  • Spread: Range is $95 - 67 = 28$ points.
  • Notable features: One lower score (67) is somewhat separated from the main group, which starts at 72. The 80s are the most common decade.

This appears to be a successful class overall, with most students scoring B’s and above, and one student who may need additional support.

Summary

  • Data visualization transforms numbers into pictures that reveal patterns our brains cannot detect in raw data. The right visualization makes trends, clusters, outliers, and distributions immediately visible.

  • Dot plots are the simplest visualization: dots stacked above a number line. Best for small datasets where you want to see individual values.

  • Histograms group numerical data into bins and show frequency with bar heights. They reveal the distribution’s shape—symmetric vs. skewed, unimodal vs. multimodal.

  • When describing distributions, address four aspects: shape (symmetric, skewed left, skewed right), center (where typical values fall), spread (how scattered the data is), and unusual features (outliers, gaps, clusters).

  • Skewed distributions have tails: skewed right means the tail extends toward higher values (like income); skewed left means the tail extends toward lower values (like age at retirement).

  • Stem-and-leaf plots preserve exact data values while showing the distribution’s shape. The stem is all digits except the last; the leaf is the final digit.

  • Box plots display the five-number summary visually. The box spans $Q_1$ to $Q_3$ (the IQR), a line marks the median, whiskers extend to extreme non-outlier values, and outliers appear as individual points.

  • Side-by-side box plots are excellent for comparing distributions across groups, revealing differences in center, spread, and skewness at a glance.

  • Misleading graphs can deceive through truncated axes, manipulated aspect ratios, 3D effects, cherry-picked time periods, and unlabeled or unclear scales.

  • To read graphs critically: always check where axes start, whether labels are clear, and whether the visual impression matches the actual numbers. Be especially skeptical when small numerical differences create large visual differences.