Learn R Programming

tidystats (version 0.5)

describe_data: Calculate common descriptive statistics

Description

describe_data returns a set of common descriptive statistics (e.g., n, mean, sd) for numeric variables.

Usage

describe_data(data, column, na.rm = TRUE, short = FALSE)

Arguments

data

A data frame.

column

An unquoted (numerical) column name from the data frame.

na.rm

Logical. Should missing values (including NaN) be excluded in calculating the descriptives? The default is TRUE.

short

Logical. Should only a subset of descriptives be reported? If set to TRUE, only the N, M, and SD will be returned. The default is FALSE.

Details

The data can be grouped using dplyr::group_by so that descriptives will be calculated for each group level.

When na.rm is set to FALSE, a percentage column will be added to the output that contains the percentage of non-missing data.

Skew and kurtosis are based on the skewness and kurtosis functions of the moments package (Komsta & Novomestky, 2015).

Percentages are calculated based on the total of non-missing observations. When na.rm is set to FALSE, percentages are based on the total of missing and non-missing observations.

Examples

Run this code
# NOT RUN {
# Load the dplyr package for access to the %>% operator and group_by()
library(dplyr)

# Inspect descriptives of the response column from the 'quote_source' data
# frame included in tidystats
describe_data(quote_source, response)

# Repeat the former, now for each level of the source column
quote_source %>%
  group_by(source) %>%
  describe_data(response)
  
# Only inspect the total N, mean, and standard deviation
quote_source %>%
  group_by(source) %>%
  describe_data(response, short = TRUE)

# }

Run the code above in your browser using DataLab