The dataset is a sample of 440 customers characterized
by 6 continuous variables, giving the annual spending related to different
types of goods. The variables are Fresh
, Milk
, Grocery
, Frozen
, Detergents_Paper
and Delicassen
.
Two more variables are categorical and provide information on the
customer channel (Channel
with 2 levels: Horeca, i.e., Hotel/Restaurant/Cafe, Retail) and the region
(Region
with 3 levels: Lisbon, Oporto, Other). The categorical variables should not play
an active role in the clustering process, but they can be used ex-post to aid cluster
interpretation.