the insurance dataset contains \(7\) features and \(1338\) records. the target feature is charge and the remaining 6 variables are predictors. This dataset is simulated on the basis of demographic statistics from the US Census Bureau.
data(insurance)the insurance dataset, as a data frame, contains \(1338\) rows (customers) and \(7\) columns (variables/features). the \(7\) variables are:
age: age of primary beneficiary.
bmi: body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9.
children: Number of children covered by health insurance / Number of dependents.
smoker: Smoking as a factor with 2 levels, yes, no.
gender: insurance contractor gender, female, male.
region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.
charge: individual medical costs billed by health insurance.
For more information related to the dataset see:
https://www.kaggle.com/mirichoi0218/insurance
Brett Lantz (2019). Machine Learning with R: Expert techniques for predictive modeling. Packt Publishing Ltd.
Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app.
bank,
churn,
churnCredit,
churnTel,
adult,
risk,
cereal,
advertising,
marketing,
drug,
house,
housePrice,
redWines,
whiteWines,
caravan,
fertilizer,
corona
data(insurance)
str(insurance)
Run the code above in your browser using DataLab