The objective is to identify each of a number of benign or malignant classes.
Samples arrive periodically as Dr. Wolberg reports his clinical cases. The
database therefore reflects this chronological grouping of the data. This
grouping information appears immediately below, having been removed from
the data itself. Each variable except for the first was converted into 11
primitive numerical attributes with values ranging from 0 through 10. There
are 16 missing attribute values.
Data frame (tibble) with 675 observations on 10 variables: a factor Id,
9 numeric variables, and target class:
Id, Sample code number
Cl.thickness, Clump thickness
Cell.size, Uniformity of cell size
Cell.shape, Uniformity of cell shape
Marg.adhesion, Marginal adhesion
Epith.c.size, Single Epthelial cell size
Bare.nuclei, Bare nuclei
Bl.cromatin, Bland chromatin
Normal.nucleoli, Normal Nucleoli
Class, Class
Reproducing this dataset:
library("mlbench")d <- mlbench::BreastCancer
d <- d[!duplicated(d), ]
d <- d[complete.cases(d), ]
mat <- as.matrix(d[ , 2:9])
mat <- apply(mat, 2, as.numeric)
breastbancer <- dplyr::as.tibble(data.frame(Id = d$Id, mat, Class = d$Class))