Learn R Programming

mlr (version 2.3)

summarizeColumns: Summarize columns of data.frame or task.

Description

Summarizes a data.frame, somewhat differently than the normal summary function of R. The function is mainly useful as a basic EDA tool on data.frames before they are converted to tasks, but can be used on tasks as well.

Columns can be of type numeric, integer, logical, factor, or character. Characters and logicals will be treated as factors.

Usage

summarizeColumns(obj)

Arguments

Value

[data.frame]. With columns:nameName of column.typeData type of column.naNumber of NAs in column.dispMeasure of dispersion, for numerics and integers sd is used, for categorical columns the qualitative variation.meanMean value of column, NA for categorical columns.medianMedian value of column, NA for categorical columns.madMAD of column, NA for categorical columns.minMinimal value of column, for categorical columns the size of the smallest category.maxMaximal value of column, for categorical columns the size of the largest category.nlevsFor categorical columns, the number of factor levels, NA else.

See Also

Other eda_and_preprocess: capLargeValues; createDummyFeatures; dropFeatures; mergeSmallFactorLevels; normalizeFeatures; removeConstantFeatures

Examples

Run this code
summarizeColumns(iris)

Run the code above in your browser using DataLab