Learn R Programming

cheese (version 0.0.3)

descriptives: Compute descriptive statistics on columns of a data frame.

Description

Computes a number of common descriptive statistics for different types of data. The user can specify an unlimited number of additional functions to compute and the types of data that each set (including the default) of functions will be applied to.

Usage

descriptives(
    data,
    f_all = NULL,
    f_numeric = NULL,
    numeric_types = "numeric",
    f_categorical = NULL,
    categorical_types = "factor",
    f_other = NULL,
    na.rm = TRUE,
    useNA = c("ifany", "no", "always"),
    round = 2
)

Arguments

data

A data.frame. Could also be a list.

f_all

Functions to apply to all columns. Should return a scalar. See "Details" for information computed by default.

f_numeric

Functions to apply to columns conforming to numeric_types. Should return a scalar. See "Details" for information computed by default.

numeric_types

Character vector of data types that should be evaluated with f_numeric.

f_categorical

Functions to apply to columns conforming to categorical_types. Should return a named vector where the names correpond to the levels. See "Details" for information computed by default.

categorical_types

Character vector of data types that should be evaluated with f_categorical.

f_other

Functions to apply to remaining columns.

na.rm

Logical argument supplied to f_numeric. Defaults to TRUE.

useNA

Supplied to f_categorical. See ?base::table for details. Defaults to "ifany".

round

Digit to round numeric data. Defaults to 2.

Value

A long tibble with columns .variable (for the variable name), .key (for the statistic or attribute), .value (for numeric results), .label (for non-numeric results), and .combo (convenient combination of .value and .label coerced to a character vector). If categorical variables exist, additional columns .level (for the factor levels) and .order (to retain order of the levels).

Details

The min, max, median, iqr, mean, sd are automatically computed for numeric data and table, prop.table*100 for categorical data. The sample size, number of missing values, number of nonmissing values, the number of unique values, the class are automatically computed on all columns.

Examples

Run this code
# NOT RUN {
require(tidyverse)

#1) Default
heart_disease %>%
    descriptives()

#2) Allow logicals as categorical
heart_disease %>%
    descriptives(
        categorical_types = c("logical", "factor")
    )

#3) Only apply "other" functions to numeric types
heart_disease %>%
    descriptives(
        numeric_types = NULL
    )

#4) Compute a custom function
heart_disease %>%
    descriptives(
        f_numeric = 
            list(
                cv = function(x, na.rm) sd(x, na.rm = na.rm)/mean(x, na.rm = na.rm)
            )
    )

# }

Run the code above in your browser using DataLab