utils_stats: Useful functions for computing descriptive statistics

Description

The following functions compute descriptive statistics by levels of a factor or combination of factors quickly.
- cv_by() For computing coefficient of variation.
- max_by() For computing maximum values.
- means_by() For computing arithmetic means.
- min_by() For compuing minimum values.
- n_by() For getting the length.
- sd_by() For computing sample standard deviation.
- sem_by() For computing standard error of the mean.
Useful functions for descriptive statistics. All of them work naturally with %>%, handle grouped data and multiple variables (all numeric variables from .data by default).
- av_dev() computes the average absolute deviation.
- ci_mean() computes the confidence interval for the mean.
- cv() computes the coefficient of variation.
- freq_table() Computes frequency fable. Handles grouped data.

hmean(), gmean() computes the harmonic and geometric means, respectively. The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. The geometric mean is the nth root of n products.
- kurt() computes the kurtosis like used in SAS and SPSS.
- range_data() Computes the range of the values.
- sd_amo(), sd_pop() Computes sample and populational standard deviation, respectively.
- sem() computes the standard error of the mean.
- skew() computes the skewness like used in SAS and SPSS.
- sum_dev() computes the sum of the absolute deviations.
- sum_sq_dev() computes the sum of the squared deviations.
- var_amo(), var_pop() computes sample and populational variance.
- valid_n() Return the valid (not NA) length of a data.

desc_stat is wrapper function around the above ones and can be used to compute quickly all these statistics at once.

Usage

av_dev(.data, ..., na.rm = FALSE)
ci_mean(.data, ..., na.rm = FALSE, level = 0.95)
cv(.data, ..., na.rm = FALSE)
freq_table(.data, ...)
hmean(.data, ..., na.rm = FALSE)
hm_mean(.data, ..., na.rm = FALSE)
gmean(.data, ..., na.rm = FALSE)
gm_mean(.data, ..., na.rm = FALSE)
kurt(.data, ..., na.rm = FALSE)
range_data(.data, ..., na.rm = FALSE)
sd_amo(.data, ..., na.rm = FALSE)
sd_pop(.data, ..., na.rm = FALSE)
sem(.data, ..., na.rm = FALSE)
skew(.data, ..., na.rm = FALSE)
sum_dev(.data, ..., na.rm = FALSE)
sum_sq_dev(.data, ..., na.rm = FALSE)
var_pop(.data, ..., na.rm = FALSE)
var_amo(.data, ..., na.rm = FALSE)
valid_n(.data, ..., na.rm = FALSE)
cv_by(.data, ..., na.rm = FALSE)
max_by(.data, ..., na.rm = FALSE)
means_by(.data, ..., na.rm = FALSE)
min_by(.data, ..., na.rm = FALSE)
n_by(.data, ..., na.rm = FALSE)
sd_by(.data, ..., na.rm = FALSE)
sem_by(.data, ..., na.rm = FALSE)

Arguments

.data

A data frame or a numeric vector.

...

The argument depends on the function used.

For *_by functions, ... is one or more categorical variables for grouping the data. Then the statistic required will be computed for all numeric variables in the data. If no variables are informed in ..., the statistic will be computed ignoring all non-numeric variables in .data.
For the other statistics, ... is a comma-separated of unquoted variable names to compute the statistics. If no variables are informed in n ..., the statistic will be computed for all numeric variables in .data.

na.rm

A logical value indicating whether NA values should be stripped before the computation proceeds. Defaults to FALSE.

level

The confidence level for the confidence interval of the mean. Defaults to 0.95.

Value

Functions *_by() returns a tbl_df with the computed statistics by each level of the factor(s) declared in ....
All other functions return a nammed integer if the input is a data frame or a numeric value if the input is a numeric vector.

Examples

Run this code

# NOT RUN {
library(metan)
# means of all numeric variables by ENV
means_by(data_ge2, GEN, ENV)

# Coefficient of variation for all numeric variables
# by GEN and ENV
cv_by(data_ge2, GEN, ENV)

# Skewness of a numeric vector
set.seed(1)
nvec <- rnorm(200, 10, 1)
skew(nvec)

# Confidence interval 0.95 for the mean
# All numeric variables
# Grouped by levels of ENV
data_ge2 %>%
  group_by(ENV) %>%
  ci_mean()

# standard error of the mean
# Variable PH and EH
sem(data_ge2, PH, EH)

# Frequency table for variable NR
data_ge2 %>%
  freq_table(NR)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab