While this function is a summary method for objects of the colorDF
class, it can also be applied to any other data frame-like object.
The summary table has five columns and as many rows as there are columns
in the summarized data frame (or elements in a list). First four columns contain, respectively,
column name, column class (abbreviated as in tibbles),
number of unique values
and number of missing values (NA
's). The contents of the fifth column depends on the
column class and column type as follows:
first, any lists are unlisted
numeric columns (including integers) are summarized (see below)
for character vectors and factors, if all values are unique or missing (NA) then this
is stated explicitely
otherwise, for character vectors and factors, the values will be listed, starting with the most
frequent. The list will be shortened to fit the screen.
For numeric columns, by default the quantiles 0 (minimum), .25, .50 (median), .75
and 1 (maximum) are shown. Following alternatives can be specified using
the option numformat
:
"mean": mean +- standard deviation
"graphics": a graphical summary. Note that all numerical columns will
be scaled with the same parameter, so this option makes sense only if the
numerical columns are comparable. The graphics summary looks like
this: ---| + |---- and corresponds to a regular box plot, indicating the
extremes and the three quartiles (- ... - indicates the data range, |...| the
interquartile range and '+' stands for the median).
summary_colorDF
is the exported version of this function to facilitate
usage in cases when converting an object to a colorDF is not desirable.