Sampling and Study Design

How to collect data that tells the truth

Imagine you want to know what your whole school thinks about a proposed change to the lunch menu. You could ask every single student, but that would take forever. So instead, you decide to ask a smaller group and use their opinions to represent everyone. This is the essence of sampling, and it is one of the most powerful ideas in statistics.

But here is the catch: if you only ask your friends, or only ask people in the cafeteria who already like the current menu, your results will not represent the school at all. The way you collect data determines whether your conclusions are trustworthy or misleading. A study with a flawed design can produce confident-sounding results that are completely wrong.

You have probably seen headlines like “Study shows coffee is good for you!” followed a week later by “Study shows coffee is bad for you!” The contradiction often comes down to how the studies were designed. Understanding sampling and study design helps you think critically about the data-based claims you encounter every day and, when you need to collect your own data, do it in a way that actually answers your questions.

Core Concepts

Why Sampling Method Matters

Here is a famous cautionary tale from statistics history. In 1936, the Literary Digest magazine conducted a massive poll to predict the US presidential election. They sent out 10 million questionnaires and received 2.4 million responses, an enormous sample. They confidently predicted that Alf Landon would defeat Franklin Roosevelt in a landslide.

Roosevelt won in one of the biggest landslides in American history, carrying 46 of 48 states.

What went wrong? The magazine selected names from telephone directories and automobile registrations. In 1936, during the Great Depression, people with phones and cars were wealthier than average and more likely to vote Republican. The sample was huge but deeply biased. Meanwhile, a young pollster named George Gallup used a much smaller but carefully designed sample and correctly predicted Roosevelt’s victory.

The lesson: a large sample does not guarantee good results. How you select your sample matters more than how big it is.

Simple Random Sample (SRS)

A simple random sample is the gold standard of sampling methods. In a simple random sample of size $n$, every possible group of $n$ individuals from the population has an equal chance of being selected.

Think of it like putting every person’s name in a hat and drawing names without looking. Every name has the same chance of being drawn, and every possible combination of names has the same chance of being the final sample.

How to do it in practice:

  1. Make a list of everyone in the population (this list is called the sampling frame)
  2. Assign each person a number
  3. Use a random number generator to select which numbers to include

Simple random sampling eliminates systematic bias. It does not guarantee that your sample perfectly matches the population (random chance could give you an unusual sample), but it ensures that the selection process itself does not favor certain groups over others.

Advantages:

  • No systematic bias in selection
  • Every member has an equal chance of selection
  • Statistical theory is well-developed for SRS

Disadvantages:

  • Requires a complete list of the population (which may not exist)
  • May miss small subgroups by chance
  • Can be impractical for large, spread-out populations

Stratified Sampling

Sometimes you want to ensure that important subgroups are properly represented in your sample. Stratified sampling divides the population into non-overlapping groups called strata (singular: stratum), then takes a simple random sample from each stratum.

For example, suppose you want to survey students at a university about housing satisfaction, and you know that freshmen, sophomores, juniors, and seniors might have very different experiences. With a simple random sample, you might accidentally get very few seniors. With stratified sampling, you would:

  1. Divide all students into four strata by year
  2. Randomly select students from each stratum
  3. Combine the samples

You can sample proportionally (if 25% of students are freshmen, 25% of your sample should be freshmen) or equally (same number from each stratum, useful when you want to compare groups).

When to use it:

  • When the population has distinct subgroups that might differ on what you are measuring
  • When you want to ensure every subgroup is represented
  • When you want to make comparisons between subgroups

Advantages:

  • Guarantees representation of all strata
  • Often more precise than SRS (lower variability)
  • Allows comparisons between groups

Disadvantages:

  • Requires knowing which stratum each population member belongs to
  • More complex to implement than SRS

Cluster Sampling

In cluster sampling, you divide the population into groups called clusters, randomly select some clusters, and then include everyone (or a random sample of people) from the selected clusters.

This is very different from stratified sampling. In stratified sampling, you sample from every stratum. In cluster sampling, you only sample from some clusters.

Suppose you want to survey households in a large city. Going door-to-door across the entire city would be expensive and time-consuming. Instead, you could:

  1. Divide the city into neighborhoods (clusters)
  2. Randomly select some neighborhoods
  3. Survey all households (or a random sample of households) in those selected neighborhoods

When to use it:

  • When a complete list of individuals does not exist, but a list of clusters does
  • When the population is geographically spread out
  • When visiting every location would be too expensive

Advantages:

  • More practical and economical for spread-out populations
  • Does not require a list of every individual
  • Reduces travel and administrative costs

Disadvantages:

  • Generally less precise than SRS or stratified sampling
  • Requires clusters to be similar to each other (if clusters differ a lot, you get more variability)

Systematic Sampling

In systematic sampling, you select every $k$th individual from a list after choosing a random starting point.

For example, to select 100 people from a list of 2,000:

  1. Calculate the interval: $k = 2000 \div 100 = 20$
  2. Randomly choose a starting point between 1 and 20 (say you get 7)
  3. Select persons 7, 27, 47, 67, 87, … and so on

When to use it:

  • When you have an ordered list and want a quick way to sample
  • When randomization is difficult but a systematic approach is feasible
  • In quality control (testing every 50th item off an assembly line)

Advantages:

  • Easy to implement
  • Spreads the sample evenly across the list

Disadvantages:

  • Can be biased if there is a hidden pattern in the list that matches the interval
  • Not truly random (once you pick the starting point, everything is determined)

Convenience Sampling and Why It Is Problematic

A convenience sample consists of whoever is easiest to reach. Standing outside a building and surveying people who walk by is convenience sampling. Asking your social media followers for their opinions is convenience sampling. Surveying students in your class is convenience sampling.

Convenience samples are extremely common because they are cheap and easy. They are also almost always biased.

The problem is that the people who are convenient to reach are systematically different from those who are not. Your social media followers are not representative of the general public. People walking by a particular building at a particular time have something in common (they are there at that time, for that reason). Your classmates share your educational context.

When convenience sampling might be acceptable:

  • Pilot testing (trying out a survey before the real study)
  • When you only want to describe this specific group, not a larger population
  • When the research question is about a basic human trait unlikely to vary much

When convenience sampling is definitely not acceptable:

  • When you want to make claims about a population
  • When the people convenient to you likely differ from the population on relevant characteristics
  • When decisions or policies will be based on the results

Types of Bias

Bias is a systematic error that makes your sample or measurements differ from the truth in a predictable direction. Unlike random error (which averages out), bias pushes your results consistently in one direction. Understanding types of bias helps you recognize flawed studies and design better ones.

Selection Bias

Selection bias occurs when the way you select participants makes some groups more or less likely to be included.

Examples:

  • Surveying gym members about exercise habits (misses people who do not exercise)
  • Using phone surveys when some populations do not have phones
  • Recruiting volunteers for a study (people who volunteer may differ from those who do not)

Response Bias

Response bias occurs when something about the measurement process causes inaccurate responses.

Common causes:

  • Leading questions: “Do you agree that our wonderful new policy should continue?” pushes toward “yes”
  • Social desirability: People underreport embarrassing behaviors (alcohol use, prejudice) and overreport admirable ones (exercise, charitable giving)
  • Acquiescence: Some people tend to agree with whatever is asked
  • Recall problems: People may not accurately remember past events

Nonresponse Bias

Nonresponse bias occurs when people who do not respond to a survey differ systematically from those who do.

If you send a survey to 1,000 people and 200 respond, those 200 are your actual sample. If the 800 non-responders would have answered differently, your results are biased.

Busy people, people without strong opinions, and people who are hard to reach often do not respond. Studies work hard to minimize nonresponse and to understand how non-responders might differ from responders.

Observational Studies vs. Experiments

The distinction between observational studies and experiments is one of the most important ideas in research design.

In an observational study, researchers observe and measure variables without trying to influence them. They record what naturally happens or what people naturally choose to do.

Examples:

  • Tracking people’s diets and health outcomes over many years
  • Comparing test scores of students who chose to take music lessons versus those who did not
  • Analyzing whether neighborhoods with more trees have lower crime rates

In an experiment, researchers actively impose treatments on participants to see what effect the treatments have.

Examples:

  • Randomly assigning patients to receive a new drug or a placebo
  • Randomly assigning students to different teaching methods
  • Testing different website layouts by randomly showing them to different users

The key question: Can you establish cause and effect?

Observational studies can show that two things are associated, but they cannot prove that one causes the other. Experiments, when properly designed with random assignment, can establish causation.

Why? Because in an observational study, people who differ in one way usually differ in other ways too. People who choose to take music lessons might come from families that value education, have more resources, or have other characteristics that also affect academic performance. You cannot tell if music lessons caused better grades or if some other factor explains both.

In an experiment with random assignment, the groups are similar in all characteristics (both measured and unmeasured) except for the treatment they receive. Any difference in outcomes can be attributed to the treatment.

Confounding Variables

A confounding variable (or confounder) is a variable that is associated with both the explanatory variable and the response variable, making it impossible to determine which is actually responsible for the observed effect.

Suppose you observe that people who drink moderate amounts of wine have better heart health than those who do not drink. Can you conclude that wine is good for your heart?

Not so fast. People who drink moderate amounts of wine might also:

  • Have higher incomes
  • Eat healthier diets
  • Exercise more regularly
  • Have better access to healthcare

Any of these could be the real reason for better heart health. Wine drinking might just be a marker for a healthier lifestyle overall. Income, diet, exercise, and healthcare access are all potential confounding variables.

Confounding is why observational studies must be interpreted cautiously. Researchers try to measure and statistically control for confounders, but they can never be sure they have identified all of them. There could always be an unmeasured confounder lurking.

Randomized Controlled Experiments

A randomized controlled experiment is the most powerful design for establishing cause and effect. It has two essential features:

  1. Random assignment: Participants are assigned to treatment groups by a random process (like flipping a coin or using a random number generator)
  2. Control group: At least one group receives no treatment, a placebo, or the current standard treatment

Random assignment is what makes experiments so powerful. When you randomly assign participants to groups, the groups should be similar in all characteristics, both those you can measure and those you cannot. Any differences between groups that emerge after treatment can be attributed to the treatment itself.

Components of a well-designed experiment:

Treatment group(s): The group(s) that receive the intervention being tested.

Control group: The comparison group. This might receive:

  • A placebo (fake treatment)
  • The current standard treatment
  • No treatment at all

Random assignment: Using chance to determine who goes in which group.

Replication: Having enough participants that results are not due to chance.

The Placebo Effect and Blinding

The placebo effect is a real phenomenon where people improve simply because they believe they are receiving treatment, even when the treatment is fake. If you give someone a sugar pill and tell them it is a powerful painkiller, many will report feeling less pain. This is not “all in their head” in a dismissive sense. The belief in treatment triggers real physiological changes.

The placebo effect creates a problem for research: if patients in the treatment group improve, how do you know if it was the real treatment or just the expectation of improvement? This is why control groups often receive a placebo: a fake treatment that looks identical to the real one (a sugar pill that looks like the medication, a saline injection, etc.).

Blinding (or masking) takes this further:

Single-blind: Participants do not know which treatment they are receiving, but researchers do.

Double-blind: Neither participants nor the researchers interacting with them know who is receiving which treatment.

Why double-blind? Because researchers, even unconsciously, might treat participants differently if they know who is getting the real treatment. They might be more encouraging, interpret symptoms more favorably, or ask leading questions. Double-blinding eliminates this source of bias.

Triple-blind: Participants, researchers, and the statisticians analyzing the data all do not know which group is which until analysis is complete.

Notation and Terminology

Term Meaning Example
Simple random sample (SRS) Every subset of size $n$ is equally likely to be chosen Draw names from a hat
Stratified sample Divide population into groups, randomly sample from each Survey students from each class year
Cluster sample Randomly select entire groups, sample within those Survey all students in randomly selected classrooms
Systematic sample Select every $k$th individual after random start Test every 20th item on assembly line
Convenience sample Sample whoever is easiest to reach Survey people at a mall
Sampling frame List of all individuals from which sample is drawn University’s list of enrolled students
Bias Systematic error favoring certain outcomes Survey about exercise given only at gyms
Selection bias Sampling method makes some groups more likely to be included Phone surveys missing people without phones
Response bias Something about measurement causes inaccurate responses Leading questions
Nonresponse bias Non-responders differ systematically from responders Busy people not completing long surveys
Confounding variable Third variable affecting both explanatory and response Income affecting both wine consumption and health
Observational study Researchers observe without imposing treatments Tracking diet and disease over time
Experiment Researchers impose treatments on participants Testing a new drug vs. placebo
Placebo Fake treatment that appears identical to real treatment Sugar pill that looks like medication
Single-blind Participants do not know which treatment they receive Patients unaware if they got drug or placebo
Double-blind Neither participants nor administrators know treatment assignments Neither patient nor doctor knows treatment

Examples

Example 1: Identifying the Sampling Method

Identify the sampling method used in each scenario.

a) A quality control manager tests every 50th smartphone coming off the assembly line.

b) A researcher obtains a list of all 10,000 employees at a company, assigns each a number, and uses a random number generator to select 200 employees to survey.

c) A polling organization divides voters into regions (Northeast, Southeast, Midwest, Southwest, West), then randomly selects voters from each region.

d) A student working on a research project surveys other students in their dormitory.

e) A school district wants to assess reading levels. They randomly select 10 schools, then test all third-graders in those schools.

Solution:

a) Systematic sampling The manager selects at regular intervals (every 50th item) from an ordered sequence. This is the hallmark of systematic sampling.

b) Simple random sample Every employee has an equal chance of being selected, and every possible group of 200 has an equal chance of being the sample. This is classic SRS.

c) Stratified sampling The population is divided into non-overlapping groups (strata) based on geography, and random samples are taken from each stratum.

d) Convenience sampling The student is surveying whoever is easiest to access. Dormitory residents are readily available but may not represent all students.

e) Cluster sampling Entire groups (schools) are randomly selected, and then everyone in certain categories within those groups (third-graders) is included.

Example 2: Identifying Potential Bias

A local news station wants to know what residents think about a proposed park renovation. They post a poll on their website asking, “Do you support the city spending $2 million on park improvements?” After one week, 847 people have responded, with 67% saying “No.”

Identify at least two potential sources of bias in this approach.

Solution:

1. Voluntary response bias People who feel strongly about an issue are more likely to take the time to respond to an online poll. Those who are angry about the proposed spending are probably more motivated to respond than those who are neutral or mildly supportive. This likely inflates the “No” percentage.

2. Selection bias / Undercoverage Only people who visit the news station’s website and see the poll can respond. This excludes residents without internet access, those who do not visit this website, and those who do not happen to see the poll during the week it is active. The website’s audience may not represent all city residents.

3. Question wording bias (response bias) The question emphasizes the cost (“spending $2 million”) rather than the benefits. A question like “Do you support improving the city park?” might get different responses than one that highlights the price tag. The wording may nudge people toward “No.”

4. Nonresponse bias Of all the people who saw the poll, only some chose to respond. If those who did not respond have different opinions, the results are biased.

Bottom line: This poll tells us almost nothing about what all city residents think. It only tells us what visitors to this website who chose to respond think, and even that might be influenced by biased wording.

Example 3: Designing a Stratified Sampling Plan

A university with 20,000 students wants to survey student satisfaction with campus technology services. The administration believes satisfaction might vary by student classification and wants to ensure all groups are represented.

Student body composition:

  • Freshmen: 5,000 students (25%)
  • Sophomores: 4,000 students (20%)
  • Juniors: 4,500 students (22.5%)
  • Seniors: 4,500 students (22.5%)
  • Graduate students: 2,000 students (10%)

Design a stratified sampling plan to survey 400 students with proportional allocation.

Solution:

Step 1: Identify the strata The strata are the five student classifications: Freshmen, Sophomores, Juniors, Seniors, and Graduate students.

Step 2: Calculate proportional sample sizes With proportional allocation, each stratum’s representation in the sample matches its representation in the population.

  • Freshmen: $25%$ of 400 = $0.25 \times 400 = 100$ students
  • Sophomores: $20%$ of 400 = $0.20 \times 400 = 80$ students
  • Juniors: $22.5%$ of 400 = $0.225 \times 400 = 90$ students
  • Seniors: $22.5%$ of 400 = $0.225 \times 400 = 90$ students
  • Graduate: $10%$ of 400 = $0.10 \times 400 = 40$ students

Step 3: Describe the selection process

  1. Obtain a list of all students in each classification from the registrar
  2. Assign each student within each stratum a unique number
  3. Use a random number generator to select the required number of students from each stratum
  4. Contact the selected students to complete the survey

Sample summary:

Stratum Population Percent Sample Size
Freshmen 5,000 25% 100
Sophomores 4,000 20% 80
Juniors 4,500 22.5% 90
Seniors 4,500 22.5% 90
Graduate 2,000 10% 40
Total 20,000 100% 400

This ensures every classification is represented in proportion to its size in the student body.

Example 4: Identifying Confounding Variables

A study finds that children who eat breakfast regularly have higher test scores than children who skip breakfast. The researchers conclude that eating breakfast improves academic performance.

Identify at least two potential confounding variables that might explain this association.

Solution:

Several confounding variables could explain the relationship between breakfast eating and test scores:

1. Socioeconomic status (family income) Children from higher-income families are more likely to have regular meals (including breakfast) and also tend to have higher test scores due to other advantages: better schools, more educational resources at home, tutoring, and parents with more education. Income could be causing both regular breakfast consumption and higher test scores, with no direct causal link between them.

2. Parental involvement Parents who make sure their children eat breakfast regularly are probably more involved in their children’s lives overall. They might also help with homework, ensure children get enough sleep, attend school events, and value education. Parental involvement could be the true driver of better test scores.

3. Overall health and wellness habits Families that prioritize breakfast probably also prioritize other healthy habits: adequate sleep, physical activity, and limiting screen time. Better sleep alone has been strongly linked to academic performance. The “breakfast effect” might really be a “healthy lifestyle effect.”

4. Attendance and punctuality Children who eat breakfast regularly might come from more stable home environments where they get to school on time and attend regularly. Better attendance leads to better learning and higher test scores.

Important implication: This is an observational study (researchers observed existing breakfast habits, not assigned them). Because of these potential confounders, we cannot conclude that eating breakfast causes higher test scores. To establish causation, we would need an experiment where children are randomly assigned to eat or skip breakfast, but such a study would raise practical and ethical challenges.

Example 5: Critiquing a Study Design and Suggesting Improvements

A pharmaceutical company wants to test a new medication for reducing anxiety. Here is their proposed study design:

“We will recruit 100 adults who report experiencing anxiety through advertisements at local mental health clinics. Participants will take the new medication daily for 8 weeks. At the end of the study, participants will rate their anxiety levels on a scale of 1-10. We will compare their final ratings to their initial ratings to measure improvement.”

Identify at least four problems with this study design and propose improvements for each.

Solution:

Problem 1: No control group Without a control group, any improvement could be due to the placebo effect, natural fluctuation in anxiety levels, or simply the passage of time. Many people’s anxiety improves over 8 weeks regardless of treatment.

Improvement: Include a control group that receives a placebo (an identical-looking pill with no active ingredient). This allows comparison between the medication and placebo effects.

Problem 2: No random assignment If everyone gets the medication, you cannot separate the drug’s effect from other factors.

Improvement: Randomly assign participants to the treatment group (medication) or control group (placebo) using a random number generator. This ensures groups are comparable at the start.

Problem 3: No blinding If participants know they are taking the real medication, their expectations could influence their self-reported anxiety levels. If researchers know who is taking the medication, they might unconsciously treat participants differently or interpret responses differently.

Improvement: Make the study double-blind. Neither participants nor the researchers interacting with them should know who receives the medication versus the placebo. Use identical-looking pills for both groups.

Problem 4: Selection bias in recruitment Recruiting only from mental health clinics means the sample is not representative of all adults with anxiety. Clinic patients might have more severe or different types of anxiety than people who manage symptoms without professional help.

Improvement: Use multiple recruitment methods to get a more diverse sample, including community advertisements, online recruitment, and primary care offices. Clearly define the target population and try to recruit a sample that represents it.

Problem 5: Self-reported outcomes Self-reported anxiety ratings are subjective and susceptible to response bias. Participants might report improvement because they want to please the researchers or justify their participation.

Improvement: Use validated, standardized anxiety measurement instruments (like the GAD-7 or Hamilton Anxiety Scale). Consider including objective measures where possible. Have the assessments administered by researchers who do not know the treatment assignment.

Problem 6: No accounting for dropout or nonresponse If some participants drop out (perhaps because the medication has side effects, or because their anxiety worsened), analyzing only those who completed the study could bias results.

Improvement: Use intention-to-treat analysis, which includes all participants in the group they were assigned to, whether or not they completed the study. Track and report dropout rates and reasons.

Improved study design summary: Conduct a randomized, double-blind, placebo-controlled trial. Randomly assign 200 participants (recruited from diverse sources) to medication or placebo groups. Neither participants nor assessing researchers know assignments. Use validated anxiety scales administered by blinded assessors. Analyze all participants according to their assigned group.

Example 6: Evaluating a Real-World Controversy

Two headlines appear in the same month:

Headline A: “Study of 50,000 people finds coffee drinkers live longer”

Headline B: “Experiment shows caffeine disrupts sleep and raises stress hormones”

Are these findings contradictory? Explain how both could be true and what each type of study can and cannot tell us.

Solution:

These findings are not necessarily contradictory. They come from different types of studies that answer different questions.

Headline A: The observational study This was likely a large observational study that followed people over many years, recording their coffee consumption and mortality. Finding that coffee drinkers live longer shows an association, not causation.

What it can tell us: In this population, people who drank coffee tended to live longer than those who did not.

What it cannot tell us: That coffee caused them to live longer.

Possible explanations:

  • Coffee drinkers might have other healthy habits
  • Coffee contains antioxidants that provide benefits
  • People who are already healthier might be more likely to drink coffee (sick people might avoid it)
  • There could be confounding by socioeconomic status, lifestyle, or other factors

Headline B: The experiment This was likely a controlled experiment where participants were given caffeine or a placebo and their sleep and hormone levels were measured. Finding that caffeine disrupted sleep and raised stress hormones establishes a causal relationship for those specific outcomes.

What it can tell us: Caffeine causes disrupted sleep and elevated stress hormones in the short term under controlled conditions.

What it cannot tell us: Whether these short-term effects translate into worse long-term health outcomes.

How both can be true: Coffee might have both beneficial effects (antioxidants, alertness, social aspects of coffee drinking) and harmful effects (sleep disruption, stress hormones). The net effect on health could depend on:

  • How much coffee a person drinks
  • When they drink it (morning vs. evening)
  • Individual variation in caffeine metabolism
  • Other lifestyle factors

Additionally, the observational study cannot rule out confounders. Perhaps coffee drinkers in the study were wealthier, more educated, or had other advantages that explained their longevity, and the coffee itself had nothing to do with it.

The takeaway: Different study designs answer different questions. Observational studies show associations in real-world populations. Experiments establish causal mechanisms under controlled conditions. Neither alone tells the whole story. Good science requires multiple studies of different types, and sophisticated consumers of research learn to ask: “What kind of study was this, and what can it actually conclude?”

Key Properties and Rules

Sampling Method Selection Guide

Method Use when… Advantages Watch out for…
Simple Random Sample You have a complete list of the population and want unbiased selection Eliminates selection bias; statistical theory well-developed May miss small subgroups; requires complete list
Stratified Subgroups exist and you want to ensure representation of each Guarantees subgroup representation; often more precise Requires knowing subgroup membership
Cluster Population is geographically spread or no individual list exists Practical and economical; does not need individual list Less precise if clusters differ substantially
Systematic You have an ordered list and want quick selection Easy to implement; spreads sample across list Bias if list has hidden periodic patterns
Convenience Pilot testing only; NOT for real conclusions Cheap and easy Almost always biased; avoid for real studies

Types of Bias Summary

Selection bias: Who gets into the sample?

  • Check: Does everyone have a known chance of selection?
  • Solution: Random selection from complete sampling frame

Response bias: Are responses accurate?

  • Check: Are questions neutral? Might people feel pressure to answer a certain way?
  • Solution: Careful question design; anonymous responses; validated instruments

Nonresponse bias: Do non-responders differ from responders?

  • Check: What is the response rate? Who did not respond?
  • Solution: Follow up with non-responders; compare respondent demographics to population

Experiments vs. Observational Studies

Observational study:

  • Researchers observe without intervening
  • Can establish association between variables
  • Cannot establish causation
  • Confounding variables may explain observed relationships

Experiment:

  • Researchers impose treatments
  • Random assignment makes groups comparable
  • Can establish causation when properly designed
  • Gold standard: randomized controlled trial with blinding

Establishing Causation

To confidently claim that $X$ causes $Y$, you generally need:

  1. Association: $X$ and $Y$ occur together (or changes in $X$ correspond to changes in $Y$)
  2. Time order: $X$ precedes $Y$
  3. Elimination of alternatives: Other explanations (confounders) have been ruled out

Randomized experiments achieve all three. Observational studies can show 1 and 2 but struggle with 3.

Well-Designed Experiment Checklist

  1. Clear research question: What effect are you trying to measure?
  2. Random assignment: Participants assigned to groups by chance
  3. Control group: Comparison group receiving placebo or standard treatment
  4. Blinding: Single-blind or double-blind when possible
  5. Sufficient sample size: Enough participants to detect real effects
  6. Standardized procedures: Same protocol for all participants
  7. Pre-registration: Study design specified before data collection

Real-World Applications

Political Polling Methodology

Modern political polls use sophisticated sampling techniques to predict election outcomes from just 1,000-2,000 respondents. Pollsters use stratified sampling to ensure representation across demographics (age, gender, race, education, geography) and weight their results to match the known population.

But polls can still go wrong. In 2016 and 2020, polls in the United States underestimated support for certain candidates. Why? Possible reasons include:

  • Nonresponse bias: Certain voter groups were less likely to respond to polls
  • Likely voter models: Determining who will actually vote is difficult
  • Social desirability bias: Some voters may not admit their true preferences
  • Late deciders: Some voters make up their minds after the final polls

When reading poll results, look for: sample size, margin of error, sampling method, and response rate. A poll of 1,000 randomly selected likely voters with a 3% margin of error is far more meaningful than an online poll where anyone can vote.

Clinical Drug Trials

Before a new medication reaches the pharmacy, it goes through rigorous testing in clinical trials designed to establish both safety and efficacy.

Phase I: Small group (20-100), primarily testing safety and dosage

Phase II: Larger group (100-300), testing efficacy and side effects

Phase III: Large-scale (1,000-3,000+), randomized controlled trials comparing new treatment to existing treatments or placebo

Phase III trials are typically randomized, double-blind, and placebo-controlled. This design allows researchers to confidently claim that observed benefits are caused by the medication, not by placebo effects, natural disease progression, or other factors.

The FDA requires this level of evidence before approving new drugs. When you hear that a medication “has been shown to” reduce symptoms, it usually means randomized controlled trials demonstrated a statistically significant effect compared to placebo.

Market Research

Companies invest heavily in understanding consumer preferences, but they face constant sampling challenges.

Focus groups are convenience samples. They provide rich qualitative insights but cannot tell you what the broader market thinks.

Online surveys risk nonresponse bias and may miss demographics with less internet access.

A/B testing is true experimentation: users are randomly assigned to see different versions of a website, advertisement, or product, and their behavior is compared. This allows companies to establish that design changes cause changes in user behavior.

When you see claims like “8 out of 10 dentists recommend our toothpaste,” ask: How were those dentists selected? Were they given free samples? What were they comparing against? “Recommend” compared to what alternatives? These details matter enormously.

Educational Research

Educational interventions are difficult to study because random assignment is often impractical or ethically questionable. You cannot randomly assign students to “good” versus “bad” schools.

Researchers use various approaches:

  • Quasi-experiments: Compare naturally occurring groups that differ in treatment (schools that adopted a program vs. those that did not)
  • Regression discontinuity: Compare students just above and below a cutoff (like a test score threshold for a program)
  • Randomized trials when possible: Randomly assign students within a school to different interventions

When reading about educational research, ask whether the study was a true experiment or an observational study, and consider what confounders might explain the results.

Epidemiology and Public Health

When a new disease emerges, epidemiologists must quickly understand its spread, risk factors, and interventions. They rely heavily on observational studies because experimentation is often unethical (you cannot randomly assign people to be exposed to a disease).

Observational approaches in epidemiology:

  • Case-control studies: Compare people who got sick (cases) to similar people who did not (controls), looking back at their exposures
  • Cohort studies: Follow groups of people forward in time, tracking exposures and outcomes
  • Cross-sectional studies: Snapshot of a population at one point in time

These studies can identify risk factors and suggest interventions, but establishing causation requires careful reasoning about confounders and mechanisms. The causal link between smoking and lung cancer, for example, was established through many observational studies plus laboratory research on mechanisms, because no one could ethically run an experiment randomizing people to smoke.

Self-Test Problems

Problem 1: A researcher wants to study the exercise habits of adults in a city. For each method described, identify the sampling type.

a) She gets a list of all adult residents and uses a random number generator to select 500 names.

b) She divides the city into neighborhoods, randomly selects 10 neighborhoods, and surveys all adults in those neighborhoods.

c) She stands at the entrance of the largest shopping mall on a Saturday and asks shoppers to complete a survey.

d) She divides residents by age group (18-30, 31-50, 51-70, 71+) and randomly selects 150 people from each group.

Show Answer

a) Simple random sample (SRS) Every adult has an equal chance of being selected from the complete list.

b) Cluster sampling Entire groups (neighborhoods) are randomly selected, then everyone within those groups is included.

c) Convenience sampling She is surveying whoever is easiest to reach. Mall shoppers on Saturday are not representative of all adults.

d) Stratified sampling The population is divided into strata (age groups) and random samples are taken from each stratum.

Problem 2: A survey asks employees: “Don’t you agree that our company’s management is doing a poor job?” 72% say yes. Identify the type of bias most evident in this question and explain how it might affect results.

Show Answer

Response bias (specifically, leading question bias)

The question is worded in a way that pushes respondents toward a “yes” answer:

  • The phrase “Don’t you agree” assumes agreement
  • The negative framing (“poor job”) primes a negative response
  • The structure makes disagreeing awkward

The 72% likely overstates actual dissatisfaction. A neutral question would be: “How would you rate management’s performance?” with response options ranging from “Excellent” to “Poor.”

Additionally, there might be social desirability bias in the opposite direction if employees fear their responses are not truly anonymous. They might say management is doing well to avoid consequences.

Problem 3: A hospital wants to survey 300 patients about their satisfaction with care. The hospital has:

  • 600 surgical patients (30%)
  • 1,000 medical patients (50%)
  • 400 emergency patients (20%)

Design a proportional stratified sample and calculate how many patients should be selected from each group.

Show Answer

Stratified sampling plan with proportional allocation:

Total sample size: 300 patients Total population: 2,000 patients

Calculate sample sizes for each stratum:

  • Surgical: $30%$ of 300 = $0.30 \times 300 = 90$ patients
  • Medical: $50%$ of 300 = $0.50 \times 300 = 150$ patients
  • Emergency: $20%$ of 300 = $0.20 \times 300 = 60$ patients

Summary:

Patient Type Population Percent Sample Size
Surgical 600 30% 90
Medical 1,000 50% 150
Emergency 400 20% 60
Total 2,000 100% 300

Implementation:

  1. Obtain a list of patients in each category
  2. Use a random number generator to select the required number from each list
  3. Contact selected patients for the survey

Problem 4: A study finds that people who own pets have lower rates of depression than people who do not own pets. A newspaper headline reads: “Pet ownership reduces depression!”

a) Is this headline justified based on the study described? Why or why not?

b) Identify two potential confounding variables.

c) Describe how you could design an experiment to test whether pet ownership actually reduces depression.

Show Answer

a) The headline is NOT justified.

The study is observational. It shows an association between pet ownership and lower depression rates, but it cannot establish that pet ownership causes lower depression. The headline implies causation (“reduces”), which the study design cannot support.

b) Potential confounding variables:

  1. Social connection: People who own pets might also have more social connections overall. Loneliness is associated with depression, so perhaps socially connected people both acquire pets and have lower depression.

  2. Activity level: Pet owners (especially dog owners) get more physical activity through walking pets. Exercise is known to reduce depression, so activity level might explain both pet ownership and lower depression.

  3. Socioeconomic status: Pets cost money (food, veterinary care). People with more financial resources might be more likely to own pets and also have better access to mental health care.

  4. Reverse causation: Perhaps people who are already less depressed are more likely to acquire pets, rather than pets reducing depression. Severely depressed people might not feel capable of caring for a pet.

c) Experimental design:

  1. Recruit participants who do not currently own pets and who have mild to moderate depression
  2. Randomly assign participants to:
    • Treatment group: Receive a pet (perhaps provided by the study)
    • Control group: Remain on a waiting list to receive a pet after the study
  3. Measure depression levels using standardized instruments at baseline and at regular intervals (e.g., 3, 6, 12 months)
  4. Compare changes in depression between groups

Challenges: Blinding is impossible (you know if you have a pet), and there are ethical considerations about providing pets to people and what happens after the study.

Problem 5: A technology company wants to test whether a new website design increases purchases. They have 10,000 daily visitors. Design a randomized controlled experiment to test the new design.

Show Answer

Experimental design: A/B Test

Step 1: Define groups

  • Treatment group: Sees the new website design
  • Control group: Sees the current website design

Step 2: Random assignment When visitors arrive at the website, a computer randomly assigns them to see either version (e.g., 50% probability each). This is typically done using cookies or session IDs so each visitor consistently sees the same version.

Step 3: Determine sample size Run the experiment until you have enough visitors in each group to detect a meaningful difference. Statistical power analysis can determine the required sample size based on the minimum effect you want to detect.

Step 4: Measure outcomes Track the conversion rate (percentage who make a purchase) for each group. You might also track secondary outcomes: time on site, pages viewed, items added to cart.

Step 5: Ensure validity

  • Do not change anything else about the website during the test
  • Run the test long enough to capture variation across days of the week
  • Use identical products, prices, and other elements between versions

Step 6: Analyze results Compare purchase rates between groups. Statistical tests can determine whether any observed difference is larger than would be expected by chance.

Why this design works: Random assignment ensures that any differences in who visits at what time are balanced between groups. If the new design group makes more purchases, you can conclude the design caused the increase, not some other factor.

Problem 6: Evaluate the following study design and identify at least three problems:

“To study whether a new teaching method improves math scores, a school lets teachers volunteer to use the new method. The volunteering teachers implement the method in their classes. At the end of the year, we compare test scores of students in the new-method classes to students in traditional classes.”

Show Answer

Problems with this study design:

1. No random assignment of teachers Teachers self-selected into the treatment group. Teachers who volunteer for new methods might be more motivated, innovative, or effective generally. Better outcomes could be due to having better teachers, not the teaching method itself.

2. No random assignment of students Students were not randomly assigned to classes. If the volunteering teachers are known to be good, motivated students might seek their classes. Differences in outcomes could reflect student differences, not the teaching method.

3. Confounding by teacher characteristics Teacher enthusiasm, experience, skill, and motivation are confounded with the teaching method. You cannot separate the effect of the method from the effect of having an enthusiastic, volunteering teacher.

4. No standardization Different teachers might implement the “new method” differently. Without standardized implementation, you are not really testing one method; you are testing multiple variations.

5. No blinding Teachers know which method they are using, which could affect their effort and enthusiasm. Students might know they are getting the “new” method, which could affect their motivation (Hawthorne effect).

6. Potential selection bias in schools/classes If only certain types of schools or classes participate, results may not generalize.

Better design: Randomly assign teachers to use either the new method or traditional methods (or randomly assign classrooms/schools to conditions). Provide standardized training to ensure consistent implementation. Use the same assessments for all students. This would allow causal conclusions about the teaching method’s effect.

Summary

  • Sampling method determines validity: A large biased sample gives worse results than a small unbiased sample. How you select participants matters more than how many you select.

  • Simple random sampling (SRS) gives every possible sample an equal chance of selection. It is the foundation of unbiased sampling but requires a complete list of the population.

  • Stratified sampling divides the population into subgroups and samples from each, ensuring all groups are represented. It is more precise than SRS when strata differ on the variable of interest.

  • Cluster sampling randomly selects groups and samples within them. It is practical for spread-out populations but less precise if clusters differ from each other.

  • Systematic sampling selects every $k$th unit after a random start. It is easy to implement but can be biased if the list has periodic patterns.

  • Convenience sampling studies whoever is easiest to reach. It is almost always biased and should be avoided for serious research.

  • Three main types of bias are selection bias (who gets in the sample), response bias (are responses accurate), and nonresponse bias (do non-responders differ).

  • Observational studies observe without intervening. They can show associations but cannot establish cause and effect due to potential confounding variables.

  • Experiments impose treatments on participants. With random assignment, they can establish causation because groups are comparable except for the treatment.

  • Confounding variables are associated with both the explanatory and response variables, making it impossible to tell which causes the effect. Random assignment eliminates confounding.

  • Randomized controlled experiments are the gold standard for establishing causation. Key features include random assignment, control groups, and often blinding.

  • Placebo effects are real improvements from belief in treatment. Blinding (single, double, or triple) prevents placebo effects and researcher bias from affecting results.

  • Understanding study design helps you evaluate data-based claims critically. Always ask: How was the sample selected? What type of study was it? What alternative explanations exist?