Learn R Programming

liver (version 1.26)

adult: adult data set

Description

the adult dataset was collected from the US Census Bureau and the primary task is to predict whether a given adult makes more than $50K a year based attributes such as education, hours of work per week, etc. the target feature is income, a factor with levels "<=50K" and ">50K", and the remaining 14 variables are predictors.

Usage

data(adult)

Arguments

Format

the adult dataset, as a data frame, contains \(48598\) rows and \(15\) columns (variables/features). the \(15\) variables are:

  • age: age in years.

  • workclass: a factor with 6 levels.

  • demogweight: the demographics to describe a person.

  • education: a factor with 16 levels.

  • education.num: number of years of education.

  • marital.status: a factor with 5 levels.

  • occupation: a factor with 15 levels.

  • relationship: a factor with 6 levels.

  • race: a factor with 5 levels.

  • gender: a factor with levels "Female","Male".

  • capital.gain: capital gains.

  • capital.loss: capital losses.

  • hours.per.week: number of hours of work per week.

  • native.country: a factor with 42 levels.

  • income: yearly income as a factor with levels "<=50K" and ">50K".

Details

For more information related to the dataset see the UCI Machine Learning Repository:
http://www.cs.toronto.edu/~delve/data/adult/desc.html
http://www.cs.toronto.edu/~delve/data/adult/adultDetail.html

References

Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Kdd.

Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app.

See Also

bank, churn, churnCredit, churnTel, risk, cereal, advertising, marketing, drug, house, housePrice, redWines, whiteWines, insurance, caravan, fertilizer, corona

Examples

Run this code
data(adult)
str(adult)

Run the code above in your browser using DataLab