desc_stat: Descriptive statistics

Description

Compute the most used measures of central tendency, position, and dispersion.

Usage

desc_stat(
  .data = NULL,
  ...,
  by = NULL,
  values = NULL,
  stats = "main",
  hist = FALSE,
  level = 0.95,
  digits = 4,
  na.rm = FALSE,
  verbose = TRUE,
  plot_theme = theme_metan()
)

Arguments

.data

The data to be analyzed. Must be a dataframe or an object of class split_factors.

...

A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric from .data variables will be used.

One variable (factor) to split the data into subsets. The function is then applied to each subset and returns a list where each element contains the results for one level of the variable in by. To split the data by more than one factor variable, use the function split_factors to pass subsetted data to .data.

values

An alternative way to pass the data to the function. It must be a numeric vector.

stats

The descriptive statistics to show. Defaults to "main" (main statistics). Set to "all" to compute all the statistics bellow or chose one (or more) of the following: 'AV.dev' (average deviation), 'CI.mean' (confidence interval for the mean), 'CV' (coefficient of variation), 'IQR' (interquartile range), 'gm.mean' (geometric mean), 'hm.mean' (harmonic mean), 'Kurt' (kurtosis), 'mad' (median absolute deviation), 'max' (maximum value), 'mean' (arithmetic mean), 'median' (median), 'min' (minimum value), 'n' (the length of the data), 'Q2.5' (the percentile 2.5%), 'Q25' (the first quartile, Q1), 'Q75' (the third quartile, Q3), 'Q97.5' (the percentile 97.5%), range (The range of data), 'SD.amo' (the sample standard deviation), 'SD.pop' (the population standard deviation), 'SE.mean' (the standard error of the mean), 'skew' (the skewness), sum (the sum of the values), sum.dev (the sum of the absolute deviations), sum.sq.dev (the sum of the squared deviations), valid.n (The size of sample with valid number (not NA), 'var.amo' (the sample variance), 'var.pop' (the population variance). Use a comma-separated vector of names to select the statistics. For example, stats = c("median, mean, CV, n")

hist

Logical argument defaults to FALSE. If hist = TRUE then a histogram is created for each selected variable.

level

The confidence level to compute the confidence interval of mean. Defaults to 0.95.

digits

The number of significant digits.

na.rm

Logical. Should missing values be removed?

verbose

Logical argument. If verbose = FALSE the code is run silently.

plot_theme

The graphical theme of the plot. Default is plot_theme = theme_metan(). For more details, see theme.

Value

A tibble with the statistics in the lines and variables in columns. If .data is an object of class split_factors, then the statistics will be shown for each level of the grouping variable in the function split_factors to pass subsetted data.to pass subsetted data to code.data.to pass subsetted data to code.data.

Details

In cases when the statistics are computed for more than two variables with data coming from the function split_factors to pass subsetted data.to pass subsetted data to code.data.to pass subsetted data to code.data.the results are returned in a long format. Thus, use the function desc_wider to convert it into a wide format (levels of the factors in the rows and statistics in the columns).

Examples

Run this code

# NOT RUN {
library(metan)

desc_stat(data_ge2, TKW)

# Compute the main statistics
# Use a numeric vector as input data
vect <- data_ge2$TKW
desc_stat(values = vect)

# Select specific statistics
desc_stat(values = c(12, 13, 19, 21, 8, NA, 23, NA),
          na.rm = TRUE,
          stats = c('mean, se.mean, cv, n, valid.n'))

# Compute the main statistics for each level of "ENV"
stats <-
  desc_stat(data_ge2,
            EP, EL, EH, ED, PH, CD,
            by = ENV,
            verbose = FALSE)

# To get a 'wide' format with the statistics of the variable EP above.
desc_wider(stats, PH)

# Compute all the statistics for each combination of "ENV" and "GEN"
# All the numeric variables in .data

stats_all <-
  data_ge2 %>%
  split_factors(ENV, GEN) %>%
  desc_stat(stats = "all", verbose = FALSE)
desc_wider(stats_all, PH)
# }

Run the code above in your browser using DataLab