Week 3
Surveys

Soci—316

Sakeef M. Karim
Amherst College

SOCIAL RESEARCH

Unit I Begins–
February 10^th

Some Reminders

Research Memo Deadline

Deadline for Memo on Research Interests

Memos are due by 8:00 PM on Friday, February 13^th.

Some Reminders

Office Hours

Why Quantify Society?

Drawing Inferences

A Random Question

Was your first impression of someone ever “off?”

Drawing Inferences

First impressions often change as we gather more information—i.e., data points—allowing us to fine-tune our inferences about how person i might behave in different situations.

Just as in statistical inference, more data helps us move from crude hypotheses to well-defined expectations about how i might behave within a “margin of error.”

Drawing Inferences

Here’s a general rule-of-thumb:

In quantitative sociological research, we often strive to generalize our findings—not just to draw inferences about person i, but to make tenable claims about U, a broader target population. To wit, we want our insights to tell us something meaningful about a large set of units—people, countries, firms, documents, inter alia.

Drawing Inferences

We’ll Return to This Example in a Second

Drawing Inferences

Two Quick Questions

How can surveys help us draw inferences?
Are all surveys designed to capture population-level characteristics?

Sampling

The Logic of Sampling

In the social sciences, we often want to know the characteristics of a population of interest. Yet collecting data from every individual in the target population may be prohibitively expensive or simply not feasible … In survey research, we collect data from a subset of observations in order to understand the target population as a whole.

(Llaudet and Imai 2023:52, EMPHASIS ADDED)

The Logic of Sampling

Back to This Figure

The Logic of Sampling

Total Population, N

Show the underlying code

library(tidyverse)

# Generating a population with one million people

set.seed(2026)

population <- tibble(x = rnorm(1e6, mean = 0, sd = 27),
                     sample = "Overall") |> 
              mutate(x = scales::rescale(x, to = c(0, 1)))

ggplot(population, 
       mapping = aes(x = x)) +
geom_density(alpha = 0.75, fill = "#FFFFB3") +
theme_minimal()

The Logic of Sampling

n = 20

Show the underlying code

set.seed(2026)

ggplot(population, 
       mapping = aes(x = x, fill = sample)) +
geom_density(alpha = 0.75) +
geom_density(colour = "white",
             alpha = 0.6,
             data = population |> 
                    # Toggle these numbers on your own time! Try n = 10, 100, 1000, 5000
                    slice_sample(n = 20) |> 
                    mutate(sample = "N = 20")) +
theme_minimal() +
scale_fill_brewer(palette = "Set3")

The Logic of Sampling

n = 1000

Show the underlying code

set.seed(2026)

ggplot(population, 
       mapping = aes(x = x, fill = sample)) +
geom_density(alpha = 0.75) +
geom_density(colour = "white",
             alpha = 0.6,
             data = population |> 
                    # Toggle these numbers on your own time! Try n = 10, 100, 1000, 5000
                    slice_sample(n = 1000) |> 
                    mutate(sample = "N = 1000")) +
theme_minimal() +
scale_fill_brewer(palette = "Set3")

The Logic of Sampling

n = 5000

Show the underlying code

set.seed(2026)

ggplot(population, 
       mapping = aes(x = x, fill = sample)) +
geom_density(alpha = 0.75) +
geom_density(colour = "white",
             alpha = 0.6,
             data = population |> 
                    # Toggle these numbers on your own time! Try n = 10, 100, 1000, 5000
                    slice_sample(n = 5000) |> 
                    mutate(sample = "N = 5000")) +
theme_minimal() +
scale_fill_brewer(palette = "Set3")

Probabilistic Sampling

A Cautionary Tale

Image can be retrieved here.

A Cautionary Tale

Full Page

Probability Samples

[W]hen we use random selection to determine the members of our sample, there is no systematic difference between the sample and the population from which it is drawn. Any differences are strictly due to random chance alone … Samples that are based on random selection are called probability samples. More precisely, a probability sample is one in which (a) random chance is used to select participants for the sample, and (b) each individual has a probability of being selected that can be calculated.

(Carr et al. 2020:158, EMPHASIS ADDED)

Probability Samples

Of course, random sampling alone is not a panacea. Other considerations include—but are not limited to—our sample size, n, and target population, U.

Consider the following list of potential respondents:

 [1] "Quimby, Kyle"         "Tierney, Braden"      "Owens, George"       
 [4] "Rio, Corinna"         "Learned, Gerad"       "Sullivan, Kevin"     
 [7] "Schroeder, Brandon"   "Cooper, Sarah"        "Taylor, Samantha"    
[10] "Mills, Colleen"       "Dye, Bryan"           "Bartholomew, Michael"
[13] "Price, Brooke"        "Hartz, Casey"         "Noxon, Kayla"        
[16] "Mccombs, Christine"   "Carter, Jonathon"     "Drassen, Taylor"     
[19] "Fraley, Alex"         "Musk, Elon"

Let’s say we want to talk to two of our respondents. Through random sampling, we might end up with this duo:

Show the underlying code

set.seed(2026) |> 
c(randomNames::randomNames(19, ethnicity = 5), "Musk, Elon") |> 
sample(size = 2)

[1] "Musk, Elon"      "Drassen, Taylor"

Probability Samples

A Question

What is a sampling frame?

A simple random sample has two key features. First, each individual has the same probability of being selected into the sample. Second—and this is the tricky part—each pair of individuals has the same probability of being selected.

(Carr et al. 2020:164, EMPHASIS ADDED)

Click to Expand Image & Launch Gallery

Panel C in Figure 6.2 in Carr et al. (2020)

In stratified sampling, the population is divided into groups (called strata), and the researcher selects some members of every group.

(Carr et al. 2020:166, EMPHASIS ADDED)

Click to Expand Image & Launch Gallery

Panel B in Figure 6.2 in Carr et al. (2020)

In cluster sampling, the target population is first divided into groups, called clusters. Some of these clusters are selected at random. Then, some individuals are selected at random from within each cluster.

(Carr et al. 2020:165, EMPHASIS ADDED)

Click to Expand Image & Launch Gallery

Panel A in Figure 6.2 in Carr et al. (2020)

Probability Sampling Techniques

Simple Random Sampling

Probability Sampling Techniques

Stratified Sampling

Probability Sampling Techniques

Cluster Sampling

A Note on Weighting

The Intuition
A Simple Example

In probability sampling, it is not a problem if some individuals have a higher likelihood of being in the sample than others, as long as we know what those probabilities are. If Person A is x times more likely to be in our sample than Person B, then we give Person A 1/x times as much weight as Person B when computing our estimates.

(Carr et al. 2020:168, EMPHASIS ADDED)

Let’s say that s_share captures the distribution of islands in our sample, s, while u_share represents the distribution of islands in our target population, U.

     island   s_share   u_share
1    Biscoe 0.5555556 0.4883721
2     Dream 0.3333333 0.3604651
3 Torgersen 0.1111111 0.1511628

Now, here’s the weighted distribution of islands in s after applying post-stratification weights:

Show the underlying code

set.seed(2026)

# Here's a "dummy" stratified sample of `penguins`:

dummy_sample <- penguins |> 
                slice_sample(prop = 0.03, by = island)


# We have data on the distribution of islands in our data:

u_share <- penguins |> count(island) |> 
                       mutate(u_share = n/sum(n)) |> 
                       select(-n)

# Here's how we can generate simple post-stratification weights;

weights <- dummy_sample |> count(island) |> 
                           mutate(s_share = n/sum(n)) |> 
                           left_join(u_share) |> 
                           mutate(weight = u_share/s_share) |> 
                           select(island, weight)

# And here's how we can apply those weights:

dummy_sample |> left_join(weights) |> 
                count(island, wt = weight) |> 
                mutate(s_share = n/sum(n)) |> 
                select(-n)

     island   s_share
1    Biscoe 0.4883721
2     Dream 0.3604651
3 Torgersen 0.1511628

Non-Representative Samples

Another Question

Are non-representative samples ever desirable?

A Group Exercise

Recruiting A Sample

Move around the classroom and form groups of two. You and your teammate are fielding a survey to explore how phenomenon x varies by y (e.g., major) at Amherst College. You have three tasks:

Clearly state your research question(s).
Clearly define your target population(s).
Clearly describe how you’d recruit respondents from Amherst’s student population to arrive at s, your sample. Specifically, how could you draw on:
- A simple random sample?
- A stratified sample (specify the strata)?
- A cluster sample (specify the clusters)?

Survey Design–
February 12^th

A Friendly Reminder

Research Memo Deadline

Deadline for Memo on Research Interests

Memos are due tomorrow at 8:00 PM.

Formats, Delivery, Error

Formats

A Basic Distinction

Formats

Adapted from `Table 7.1` in Carr et al. (2020)
Characteristic	Cross-Sectional Surveys	Panel Surveys
Cost	Relatively low, as respondents are contacted only once	Relatively high, as respondents are contacted over time
Ease of Administration	Relatively easy	Relatively difficult
Causal Inference	Cannot ascertain causality clearly	Better suited to infer causality due to repeated measures
Sources of Bias	Typically exclude those difficult to reach	Same biases as cross-sectional surveys plus selective attrition
Ability to Document Change	Cannot assess within-person change	Well suited to assess within-person change over time
Attention to Social History	Cannot disentangle age, period, and cohort effects	Can partially disentangle age, period, and cohort effects if multiple cohorts included

Formats

Full Page

Modes of Delivery

Adapted from `Table 7.2` in Carr et al. (2020)
Attribute	Face-to-Face Interview	Mail or Self-Administered Questionnaire	Telephone Interview	Online Surveys
Cost	High	Low	Moderate	Low
Response Rate	High	Low	High	Moderate
Researcher Control Over Interview	High	Low	Moderate	Moderate
Interviewer Effects	High	Low	Moderate	Low

Sources of “Error”

Carr et al. (2020) discuss four major errors that
surveys are susceptible to:

Nonresponse
Measurement Error
Coverage Errors
Sampling Error

Make sure you know what these errors refer to.

A Few Question Types

Closed-Ended

What is your relationship status?

Feel free to click different boxes—no “answers” will be recorded.

Rating Scales

How much do you identify as “woke” or “anti-woke?”

Feel free to click different boxes—no “answers” will be recorded.

Open-Ended

What do you understand by “wokeness” or “being woke?”

Please write at least one complete sentence.

0/250 characters

Feel free to type an answer—it won’t be recorded.

Composite Measures

A Stylized Example

Composite Measures

A Stylized Example

Click Different Values for Each x to Estimate Aggregate Scores

x₁

x₂

x₃

Mean Score

—

Summative Score

—

Another Group Exercise

A Bad Survey on Qualtrics

Get into your groups from Tuesday. Then, complete the following tasks.

Discuss–and review–the characteristics of high-quality survey questions (see Carr et al. 2020:213–18).
Design a set of 5-10 low quality (i.e., bad) questions.
Then, fire up Qualtrics and create your bad survey.

Be sure to:
- Use different question types and formats.
- Add display logic or skip logic to select questions.

You’ll present your bad survey on Tuesday.

For some guidance on how to use Qualtrics, click here.

See You Tuesday

References

Carr, Deborah S., Elizabeth Heger Boyle, Benjamin Cornwell, Shelley J. Correll, Robert Crosnoe, et al. 2020. The Art and Science of Social Research. Second Edition. New York: W. W. Norton & Company, Inc.

Llaudet, Elena, and Kōsuke Imai. 2023. Data Analysis for Social Science: A Friendly and Practical Introduction. Princeton (N.J.) Oxford: Princeton University press.

Week 3 Surveys

Unit I Begins–February 10th

Some Reminders

Research Memo Deadline

Some Reminders

Office Hours

Why Quantify Society?

Drawing Inferences

Drawing Inferences

Drawing Inferences

Drawing Inferences

We’ll Return to This Example in a Second

Drawing Inferences

Two Quick Questions

Sampling

The Logic of Sampling

The Logic of Sampling

Back to This Figure

The Logic of Sampling

Total Population, N

The Logic of Sampling

n = 20

The Logic of Sampling

n = 1000

The Logic of Sampling

n = 5000

Probabilistic Sampling

A Cautionary Tale

A Cautionary Tale

Probability Samples

Probability Samples

Probability Samples

Probability Sampling Techniques

Probability Sampling Techniques

Simple Random Sampling

Probability Sampling Techniques

Stratified Sampling

Probability Sampling Techniques

Cluster Sampling

A Note on Weighting

Non-Representative Samples

A Group Exercise

Recruiting A Sample

Survey Design–February 12th

A Friendly Reminder

Research Memo Deadline

Formats, Delivery, Error

Formats

A Basic Distinction

Formats

Formats

Modes of Delivery

Sources of “Error”

A Few Question Types

Closed-Ended

Rating Scales

Open-Ended

Composite Measures

A Stylized Example

Composite Measures

A Stylized Example

Mean Score

Summative Score

Another Group Exercise

A Bad Survey on Qualtrics

See You Tuesday

References

Week 3
Surveys

Unit I Begins–
February 10^th

Survey Design–
February 12^th