summary_colorDF: Meaningful summary of lists and data frames

Description

Meaningful, row-wise summary function for lists and data frames

Usage

summary_colorDF(
  object,
  numformat = "quantiles",
  digits = 3,
  width = getOption("width")
)
# S3 method for colorDF
summary(object, ...)

Value

A colorful data frame of class colorDF containing useful information on a dataframe-like object.

Arguments

object: a data frame (possibly a color data frame)
numformat: format of the summary for numerical values. Can be one of "quantiles", "mean" and "graphics"
digits: number of significant digits to show (default: 3)
width: width of the summary table in characters
...: passed to summary_colorDF

Details

While this function is a summary method for objects of the colorDF class, it can also be applied to any other data frame-like object.

The summary table has five columns and as many rows as there are columns in the summarized data frame (or elements in a list). First four columns contain, respectively, column name, column class (abbreviated as in tibbles), number of unique values and number of missing values (NA's). The contents of the fifth column depends on the column class and column type as follows:

first, any lists are unlisted
numeric columns (including integers) are summarized (see below)
for character vectors and factors, if all values are unique or missing (NA) then this is stated explicitely
otherwise, for character vectors and factors, the values will be listed, starting with the most frequent. The list will be shortened to fit the screen.

For numeric columns, by default the quantiles 0 (minimum), .25, .50 (median), .75 and 1 (maximum) are shown. Following alternatives can be specified using the option numformat:

"mean": mean +- standard deviation
"graphics": a graphical summary. Note that all numerical columns will be scaled with the same parameter, so this option makes sense only if the numerical columns are comparable. The graphics summary looks like this: ---| + |---- and corresponds to a regular box plot, indicating the extremes and the three quartiles (- ... - indicates the data range, |...| the interquartile range and '+' stands for the median).

summary_colorDF is the exported version of this function to facilitate usage in cases when converting an object to a colorDF is not desirable.

Examples

Run this code

summary(colorDF(iris))
summary_colorDF(iris)
summary_colorDF(iris, numformat="g")
if(require(dplyr) && require(tidyr)) {
  starwars %>% summary_colorDF

  ## A summary of iris data by species
  iris %>% 
    mutate(row=rep(1:50, 3)) %>% 
    gather(key="parameter", value="Size", 1:4)  %>%
    mutate(pa.sp=paste(parameter, Species, sep=".")) %>% 
    select(row, pa.sp, Size) %>% 
    spread(key=pa.sp, value=Size) %>% 
    select(-row) %>%
    summary_colorDF(numformat="g")
}

Run the code above in your browser using DataLab