the adult dataset was collected from the US Census Bureau and the primary task is to predict whether a given adult makes more than $50K a year based attributes such as education, hours of work per week, etc. the target feature is income, a factor with levels "<=50K" and ">50K", and the remaining 14 variables are predictors.
data(adult)the adult dataset, as a data frame, contains \(48598\) rows and \(15\) columns (variables/features). the \(15\) variables are:
age: age in years.
workclass: a factor with 6 levels.
demogweight: the demographics to describe a person.
education: a factor with 16 levels.
education.num: number of years of education.
marital.status: a factor with 5 levels.
occupation: a factor with 15 levels.
relationship: a factor with 6 levels.
race: a factor with 5 levels.
gender: a factor with levels "Female","Male".
capital.gain: capital gains.
capital.loss: capital losses.
hours.per.week: number of hours of work per week.
native.country: a factor with 42 levels.
income: yearly income as a factor with levels "<=50K" and ">50K".
For more information related to the dataset see the UCI Machine Learning Repository:
http://www.cs.toronto.edu/~delve/data/adult/desc.html
http://www.cs.toronto.edu/~delve/data/adult/adultDetail.html
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Kdd.
risk, churn, churnTel, bank, advertising, marketing, insurance, cereal, housePrice, house