A dataset containing annual spending information for clients of a wholesale distributor, along with the customer's sales channel and geographic region. The dataset can be used to illustrate customer segmentation, clustering, exploratory data analysis, and unsupervised learning methods in R.
data(wholesale_customers)A data frame with 440 observations and 8 variables:
Annual spending on fresh products (in monetary units).
Annual spending on milk products (in monetary units).
Annual spending on grocery products (in monetary units).
Annual spending on frozen products (in monetary units).
Annual spending on detergents and paper products (in monetary units).
Annual spending on delicatessen products (in monetary units).
Customer sales channel: "Horeca" or "Retail".
Customer region: "Lisbon", "Oporto", or "Other".
This dataset was obtained from the UCI Machine Learning Repository and renamed
wholesale_customers for inclusion in the liver package. It refers
to clients of a wholesale distributor and records their annual spending in
several product categories. The dataset is well suited for illustrating methods
for clustering, customer profiling, and multivariate data exploration.
In clustering applications, the numerical spending variables are typically used
to define the clusters, while channel and region can be used
afterward to help interpret the resulting customer groups.
B. Jaya Lakshmi, K. B. Madhuri, and M. Shashi (2017). An Efficient Algorithm for Density Based Subspace Clustering with Dynamic Parameter Setting. International Journal of Information Technology and Computer Science, 9(6), 27--33. tools:::Rd_expr_doi("10.5815/ijitcs.2017.06.04")
Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app.
mortgage,
bank,
churn_mlc,
churn,
churn_tel,
adult,
cereal,
advertising,
marketing,
drug,
house,
house_price,
red_wines,
white_wines,
insurance,
caravan,
loan
data(wholesale_customers)
str(wholesale_customers)
summary(wholesale_customers)
Run the code above in your browser using DataLab