A curated subset of rows and columns from the Consumer Expenditure (CE) dataset that is provided in the [rpms](https://CRAN.R-project.org/package=rpms) package by Daniell Toth. This example dataset is designed for demonstration purposes within this package. Please reframe from using this dataset for inferential purposes.
svytestCEA data frame with _n_ rows and _m_ variables:
Consumer unit identifying variable, constructed using the first seven digits of a unique identifier.
Cluster Identifier for all clusters (constructed using PSU, REGION, STATE, and POPSIZE).
Month for which the data were collected.
Final sample weight used to make population inferences.
State FIPS code indicating the location of the consumer unit.
Region code: 1 = Northeast, 2 = Midwest, 3 = South, 4 = West.
Indicator of urban (1) versus rural (2) residence status.
Population size class of the PSU, ranging from 1 (largest) to 5 (smallest).
Housing tenure: 1 = Owned with mortgage; 2 = Owned without mortgage; 3 = Owned (mortgage not reported); 4 = Rented; 5 = Occupied without cash rent; 6 = Student housing.
Number of rooms (including finished living areas but excluding bathrooms).
Number of bathrooms in the consumer unit.
Number of bedrooms in the consumer unit.
Number of owned vehicles.
Household type based on the relationship of members to the reference person; for example, 1 = Married Couple only, 2 = Married Couple with children (oldest < 6 years), 3 = Married Couple with children (oldest 6-17 years), etc.
Number of members in the consumer unit (family size).
Count of persons less than 18 years old in the consumer unit.
Count of persons older than 64 years in the consumer unit.
Number of earners in the consumer unit.
Age of the primary earner.
Education level of the primary earner, coded as 1 = None, 2 = 1st-8th Grade, 3 = Some high school, 4 = High school, 5 = Some college, 6 = AA degree, 7 = Bachelor's degree, 8 = Advanced degree.
Gender of the primary earner (F = Female, M = Male).
Marital status of the primary earner (1 = Married, 2 = Widowed, 3 = Divorced, 4 = Separated, 5 = Never Married).
Race of the primary earner (e.g., 1 = White, 2 = Black, 3 = Native American, 4 = Asian, 5 = Pacific Islander, 6 = Multi-race).
Indicator of Hispanic, Latino, or Spanish origin (Y for yes, N for no).
Indicator if the primary earner is a member of the armed forces (Y/N).
Current college enrollment status for the primary earner (Full for full time, Part for part time, No for not enrolled).
Type of employment for the primary earner: 1 = Full time all year, 2 = Part time all year, 3 = Full time part-year, 4 = Part time part-year.
Occupational code representing the primary job of the earner.
Type of employment: coded as 1 = Employee of a private company, 2 = Federal government employee, 3 = State government employee, 4 = Local government employee, 5 = Self-employed, 6 = Working without pay in a family business.
Amount of consumer unit income before taxes in the past 12 months.
Wage or salary income received in the past 12 months, before deductions.
Income received from Social Security and Railroad Retirement in the past 12 months.
Total expenditures reported for the current quarter.
Total taxes paid (estimated) in the current period.
Total expenditures for housing in the current quarter.
Expenditures for health care during the current quarter.
Expenditures on food during the current quarter.
This example dataset is a subset extracted from the complete CE dataset used by the rpms package. It is intended to illustrate how to work with survey data in the context of recursive partitioning. The original CE data contain 68,415 observations on 47 variables; this example contains a smaller selection for ease of demonstration. The curated subset of the dataset removed several columns were removed for mostly missing data, redundant data, or not relevant to the examples. Rows were filtered for strictly-positive salary, expenditure, and tax variable values. Weights were not recalibrated following the changes.
rpms for an overview of the functions provided in the original package.