A 27-variable extract from the 2024 General Social Survey (GSS), one of the longest-running sociological surveys in the United States (fielded annually or biennially since 1972). All 3,309 respondents from the 2024 cross-section are included.
gss_2024A data frame with 3,309 rows and 27 variables:
Variance primary sampling unit. Use as the cluster ID for variance estimation.
Variance stratum. Use as the stratification variable.
Person post-stratification weight. Standard analysis weight.
Person post-stratification weight adjusted for differential non-response. Preferred when non-response bias is a concern.
Respondent ID. Unique case identifier.
Survey year (all 2024 in this extract).
Ballot form (A, B, C, or D). The GSS uses a
split-ballot design; not all questions appear on every ballot.
Inapplicable items are coded -100.
Age in years (89 = 89 or older).
Sex: 1 = male, 2 = female.
Race: 1 = white, 2 = black, 3 = other.
Hispanic origin: 1 = not Hispanic; 2–50 = specific
Hispanic origin.
Highest year of school completed (0–20 years).
Highest degree: 0 = less than HS, 1 = high school,
2 = associate, 3 = bachelor's, 4 = graduate.
Total family income (26 categories from < $1,000 to $170,000+).
Marital status: 1 = married, 2 = widowed,
3 = divorced, 4 = separated, 5 = never married.
Labor force status: 1 = full time, 2 = part time,
3 = temporarily not working, 4 = unemployed, 5 = retired,
6 = in school, 7 = keeping house, 8 = other.
Hours worked last week (for employed respondents only).
Number of adults in household (8 = 8 or more).
Party identification: 0 = strong Democrat,
3 = Independent, 6 = strong Republican, 7 = other party.
Political views: 1 = extremely liberal,
7 = extremely conservative.
General happiness: 1 = very happy, 2 = pretty happy,
3 = not too happy.
Self-rated health: 1 = excellent, 2 = good,
3 = fair, 4 = poor.
Social trust: 1 = most people can be trusted,
2 = can't be too careful, 3 = depends.
Government spending on welfare: 1 = too little,
2 = about right, 3 = too much.
Abortion for any reason: 1 = yes, 2 = no.
Religious service attendance: 0 = never,
8 = several times a week.
Religious preference: 1 = Protestant, 2 = Catholic,
3 = Jewish, 4 = none, and others.
Survey design: Stratified multi-stage cluster — use Taylor series linearization:
svy <- as_survey(gss_2024,
ids = vpsu,
strata = vstrat,
weights = wtssps, # or wtssnrps for non-response-adjusted weight
nest = TRUE
)
Missing value codes: The GSS uses a consistent system of negative integer codes for missing data across all variables:
| Code | Meaning |
-100 | Inapplicable (question not asked of this respondent) |
-99 | No answer |
-98 | Don't know |
-97 | Skipped on web |
-90 | Refused |
These codes are stored as value labels on every column (check
attr(gss_2024$happy, "labels")). Recode them to NA before analysis.
Split-ballot design: The ballot variable indicates which question
module a respondent received. Variables asked only on some ballots will
have -100 (Inapplicable) for respondents on other ballots.
Metadata:
All columns carry variable labels and value labels as R attributes from the
original SPSS file, automatically extracted into surveycore's metadata
system when you call as_survey().
Variable labels ("label" attribute): A human-readable description of
each column. Example: attr(gss_2024$happy, "label") returns
"GENERAL HAPPINESS".
Value labels ("labels" attribute): A named numeric vector mapping
each code to its meaning, including all missing-value codes. Example:
attr(gss_2024$happy, "labels") returns entries for Very happy,
Pretty happy, Not too happy, and the negative missing codes.
# Variables in the dataset
names(gss_2024)
# Create survey design
svy <- as_survey(
gss_2024,
ids = vpsu,
strata = vstrat,
weights = wtssps,
nest = TRUE
)
# Inspect variable label
attr(gss_2024$happy, "label")
# Inspect value labels (includes GSS missing-value codes)
attr(gss_2024$happy, "labels")
# Split-ballot: how many respondents per ballot form?
table(gss_2024$ballot)
Run the code above in your browser using DataLab