statsExpressions v0.3.1

0

Monthly downloads

0th

Percentile

Expressions with Statistical Details

Statistical processing backend for 'ggstatsplot', this package creates expressions with details from statistical tests. Currently, it supports only the most common types of statistical tests: parametric, nonparametric, robust, and bayesian versions of t-test/anova, correlation analyses, contingency table analysis, and meta-analysis.

Readme

statsExpressions: Expressions with statistical details

Package Status Usage GitHub References
CRAN\_Release\_Badge Build Status Daily downloads badge GitHub version Website
CRAN Checks AppVeyor Build Status Weekly downloads badge Forks Rdoc
minimal R version lifecycle Monthly downloads badge Github Issues vignettes
GitHub code size in bytes Coverage Status Total downloads badge Github Stars DOI
Licence Codecov test coverage Covrpage Summary Last-changedate GitHub last commit
status R build status Gitter chat Project Status contributions welcome

Introduction

statsExpressions provides statistical processing backend for the ggstatsplot package, which combines ggplot2 visualizations with expressions containing results from statistical tests. statsExpressions contains all functions needed to create these expressions.

Installation

To get the latest, stable CRAN release:

install.packages(pkgs = "statsExpressions")

You can get the development version of the package from GitHub. To see what new changes (and bug fixes) have been made to the package since the last release on CRAN, you can check the detailed log of changes here: https://indrajeetpatil.github.io/statsExpressions/news/index.html

If you are in hurry and want to reduce the time of installation, prefer-

# needed package to download from GitHub repo
utils::install.packages(pkgs = "remotes")

# downloading the package from GitHub
remotes::install_github(
  repo = "IndrajeetPatil/statsExpressions", # package path on GitHub
  dependencies = FALSE, # assumes you have already installed needed packages
  quick = TRUE # skips docs, demos, and vignettes
)

If time is not a constraint-

remotes::install_github(
  repo = "IndrajeetPatil/statsExpressions", # package path on GitHub
  dependencies = TRUE, # installs packages which statsExpressions depends on
  upgrade_dependencies = TRUE # updates any out of date dependencies
)

Citation

If you want to cite this package in a scientific journal or in any other context, run the following code in your R console:

citation("statsExpressions")

Documentation and Examples

To see the documentation relevant for the development version of the package, see the dedicated website for statsExpressions, which is updated after every new commit: https://indrajeetpatil.github.io/statsExpressions/.

Summary of types of statistical analyses

Currently, it supports only the most common types of statistical tests. Specifically, parametric, non-parametric, robust, and bayesian versions of:

  • t-test
  • anova
  • correlation tests
  • contingency table analysis
  • meta-analysis

The table below summarizes all the different types of analyses currently supported in this package-

Description Parametric Non-parametric Robust Bayes Factor
Between group/condition comparisons Yes Yes Yes Yes
Within group/condition comparisons Yes Yes Yes Yes
Distribution of a numeric variable Yes Yes Yes Yes
Correlation between two variables Yes Yes Yes Yes
Association between categorical variables Yes NA NA Yes
Equal proportions for categorical variable levels Yes NA NA Yes
Random-effects meta-analysis Yes No Yes Yes

Statistical reporting

For all statistical test expressions, the default template abides by the APA gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):

Summary of statistical tests and effect sizes

Here is a summary table of all the statistical tests currently supported across various functions:

Functions Type Test Effect size 95% CI available?
expr_anova_parametric (2 groups) Parametric Student’s and Welch’s t-test Cohen’s d, Hedge’s g \\checkmark
expr_anova_parametric (> 2 groups) Parametric Fisher’s and Welch’s one-way ANOVA \\eta^2, \\eta^2\_p, \\omega^2, \\omega^2\_p \\checkmark
expr_anova_nonparametric (2 groups) Non-parametric Mann-Whitney U-test r \\checkmark
expr_anova_nonparametric (> 2 groups) Non-parametric Kruskal-Wallis Rank Sum Test \\epsilon^2 \\checkmark
expr_anova_robust (2 groups) Robust Yuen’s test for trimmed means \\xi \\checkmark
expr_anova_robust (> 2 groups) Robust Heteroscedastic one-way ANOVA for trimmed means \\xi \\checkmark
expr_anova_parametric (2 groups) Parametric Student’s t-test Cohen’s d, Hedge’s g \\checkmark
expr_anova_parametric (> 2 groups) Parametric Fisher’s one-way repeated measures ANOVA \\eta^2\_p, \\omega^2 \\checkmark
expr_anova_nonparametric (2 groups) Non-parametric Wilcoxon signed-rank test r \\checkmark
expr_anova_nonparametric (> 2 groups) Non-parametric Friedman rank sum test W\_{Kendall} \\checkmark
expr_anova_robust (2 groups) Robust Yuen’s test on trimmed means for dependent samples \\xi \\checkmark
expr_anova_robust (> 2 groups) Robust Heteroscedastic one-way repeated measures ANOVA for trimmed means \\times \\times
expr_contingency_tab (unpaired) Parametric \\text{Pearson's}\~ \\chi^2 \~\\text{test} Cramér’s V \\checkmark
expr_contingency_tab (paired) Parametric McNemar’s test Cohen’s g \\checkmark
expr_contingency_tab Parametric One-sample proportion test Cramér’s V \\checkmark
expr_corr_test Parametric Pearson’s r r \\checkmark
expr_corr_test Non-parametric \\text{Spearman's}\~ \\rho \\rho \\checkmark
expr_corr_test Robust Percentage bend correlation r \\checkmark
expr_t_onesample Parametric One-sample t-test Cohen’s d, Hedge’s g \\checkmark
expr_t_onesample Non-parametric One-sample Wilcoxon signed rank test r \\checkmark
expr_t_onesample Robust One-sample percentile bootstrap robust estimator \\checkmark
expr_meta_parametric Parametric Meta-analysis via random-effects models \\beta \\checkmark
expr_meta_robust Robust Meta-analysis via robust random-effects models \\beta \\checkmark

Primary functions

A list of primary functions in this package can be found at the package website: https://indrajeetpatil.github.io/statsExpressions/reference/index.html

Following are few examples of how these functions can be used.

Example: Expressions for one-sample t-test

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)

# create fake data
df <- data.frame(x = rnorm(1000, 0.1, 0.5))

# creating a histogram plot
p <- ggplot(df, aes(x)) +
  geom_histogram(alpha = 0.5) +
  geom_vline(xintercept = mean(df$x), color = "red")

# adding a caption with a non-parametric one-sample test
p + labs(
  title = "One-Sample Wilcoxon Signed Rank Test",
  caption = expr_t_onesample(df, x, type = "nonparametric")
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.

Example: Expressions for two-sample t-test

# setup
set.seed(123)
library(ggplot2)
library(hrbrthemes)

# create a plot
p <-
  ggplot(ToothGrowth, aes(supp, len)) +
  geom_boxplot() +
  theme_ipsum_rc()

# adding a subtitle with
p + labs(
  title = "Two-Sample Welch's t-test",
  subtitle = expr_t_parametric(ToothGrowth, supp, len)
)

Example: Expressions for one-way ANOVA

Let’s say we want to check differences in weight of the vehicle based on number of cylinders in the engine and wish to carry out Welch’s ANOVA:

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)

# create a boxplot
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot() +
  labs(
    title = "Welch's ANOVA",
    subtitle = expr_anova_parametric(iris, Species, Sepal.Length, messages = FALSE)
  )

In case you change your mind and now want to carry out a robust ANOVA instead. Also, let’s use a different kind of a visualization:

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
library(ggridges)

# create a ridgeplot
p <-
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(
    jittered_points = TRUE, quantile_lines = TRUE,
    scale = 0.9, vline_size = 1, vline_color = "red",
    position = position_raincloud(adjust_vlines = TRUE)
  )

# create an expression containing details from the relevant test
results <- expr_anova_robust(iris, Species, Sepal.Length, messages = FALSE)

# display results on the plot
p + labs(
  title = "A heteroscedastic one-way ANOVA for trimmed means",
  subtitle = results
)

Example: Expressions for correlation analysis

Let’s look at another example where we want to run correlation analysis:

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)

# create a ridgeplot
p <-
  ggplot(mtcars, aes(x = mpg, y = wt)) +
  geom_point() +
  geom_smooth(method = "lm")

# create an expression containing details from the relevant test
results <- expr_corr_test(mtcars, mpg, wt, type = "nonparametric")

# display results on the plot
p + labs(
  title = "Spearman's rank correlation coefficient",
  subtitle = results
)

Example: Expressions for contingency table analysis

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)

# data
df <- as.data.frame(table(mpg$class))
colnames(df) <- c("class", "freq")

# basic pie chart
p <-
  ggplot(df, aes(x = "", y = freq, fill = factor(class))) +
  geom_bar(width = 1, stat = "identity") +
  theme(
    axis.line = element_blank(),
    plot.title = element_text(hjust = 0.5)
  )

# cleaning up the chart and adding results from one-sample proportion test
p +
  coord_polar(theta = "y", start = 0) +
  labs(
    fill = "class",
    x = NULL,
    y = NULL,
    title = "Pie Chart of class",
    subtitle = expr_onesample_proptest(df, class, counts = freq),
    caption = "One-sample goodness of fit proportion test"
  )
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.

You can also use these function to get the expression in return without having to display them in plots:

# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)

# Pearson's chi-squared test of independence
expr_contingency_tab(mtcars, am, cyl, messages = FALSE)
#> paste(NULL, chi["Pearson"]^2, "(", "2", ") = ", "8.74", ", ", 
#>     italic("p"), " = ", "0.013", ", ", widehat(italic("V"))["Cramer"], 
#>     " = ", "0.46", ", CI"["95%"], " [", "0.08", ", ", "0.75", 
#>     "]", ", ", italic("n")["obs"], " = ", 32L)

Example: Expressions for meta-analysis

# setup
set.seed(123)
library(metaviz)
library(ggplot2)

# rename columns to `statsExpressions` conventions
df <- dplyr::rename(mozart, estimate = d, std.error = se)

# meta-analysis forest plot with results random-effects meta-analysis
viz_forest(
  x = mozart[, c("d", "se")],
  study_labels = mozart[, "study_name"],
  xlab = "Cohen's d",
  variant = "thick",
  type = "cumulative"
) + # use `statsExpressions` to create expression containing results
  labs(
    title = "Meta-analysis of Pietschnig, Voracek, and Formann (2010) on the Mozart effect",
    subtitle = expr_meta_parametric(df, k = 3)
  ) +
  theme(text = element_text(size = 12))

Usage in ggstatsplot

Note that these functions were initially written to display results from statistical tests on ready-made ggplot2 plots implemented in ggstatsplot.

For detailed documentation, see the package website: https://indrajeetpatil.github.io/ggstatsplot/

Here is an example from ggstatsplot of what the plots look like when the expressions are displayed in the subtitle-

Code coverage

As the code stands right now, here is the code coverage for all primary functions involved: https://codecov.io/gh/IndrajeetPatil/statsExpressions/tree/master/R

Contributing

I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the GitHub issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull Requests for contributions are encouraged.

Here are some simple ways in which you can contribute (in the increasing order of commitment):

  • Read and correct any inconsistencies in the documentation

  • Raise issues about bugs or wanted features

  • Review code

  • Add new functionality (in the form of new plotting functions or helpers for preparing subtitles)

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Functions in statsExpressions

Name Description
bugs_long Tidy version of the "Bugs" dataset.
bf_corr_test Bayesian correlation test.
bf_contingency_tab Bayesian contingency table analysis
corr_objects Create all needed objects for correlation matrix.
bf_expr Prepare caption with expression for Bayes Factor results
effsize_ci_message Message about bootstrapped confidence intervals for effect sizes.
expr_anova_nonparametric Making text subtitle for nonparametric ANOVA.
bf_meta Bayes factor message for random-effects meta-analysis
bf_oneway_anova Bayesian one-way analysis of variance.
Titanic_full Titanic dataset.
expr_anova_bayes Making expression containing Bayesian one-way ANOVA results.
expr_t_bayes Making expression containing Bayesian t-test results
effsize_t_parametric Calculating Cohen's d or Hedge's g (for between-/within- or one sample designs).
expr_anova_parametric Making expression containing parametric ANOVA results
expr_t_nonparametric Making expression for Mann-Whitney U-test/Wilcoxon test results
VR_dilemma Virtual reality moral dilemmas.
expr_anova_robust Expression containing results from heteroscedastic one-way ANOVA for trimmed means
expr_contingency_tab Making expression for contingency table and goodness of fit tests
expr_t_onesample Expression for one sample t-test and its non-parametric and robust equivalents
expr_t_parametric Making expression containing t-test results
expr_meta_robust Making expression with frequentist random-effects robust meta-analysis results
expr_meta_parametric Making expression with frequentist random-effects meta-analysis results
bf_extractor Extract Bayes Factors from BayesFactor model object.
reexports Objects exported from other packages
iris_long Edgar Anderson's Iris Data in long format.
intent_morality Moral judgments about third-party moral behavior.
robcor_ci Robust correlation coefficient and its confidence interval
t1way_ci A heteroscedastic one-way ANOVA for trimmed means with confidence interval for effect size.
yuend_ci Paired samples robust t-tests with confidence interval for effect size.
expr_template Template for subtitles with statistical details for tests
expr_meta_bayes Making expression containing Bayesian random-effects meta-analysis.
expr_corr_test Making expression for correlation analysis
expr_t_robust Expression containing results from a robust t-test
movies_long Movie information and user ratings from IMDB.com (long format).
movies_wide Movie information and user ratings from IMDB.com (wide format).
bf_ttest Bayes Factor for t-test
No Results!

Vignettes of statsExpressions

Name
stats_details.Rmd
tests_and_coverage.Rmd
No Results!

Last month downloads

Details

Type Package
License GPL-3 | file LICENSE
URL https://indrajeetpatil.github.io/statsExpressions, https://github.com/IndrajeetPatil/statsExpressions
BugReports https://github.com/IndrajeetPatil/statsExpressions/issues
VignetteBuilder knitr
Encoding UTF-8
Language en-US
LazyData true
RoxygenNote 7.0.2.9000
NeedsCompilation no
Packaged 2020-02-14 09:56:32 UTC; inp099
Repository CRAN
Date/Publication 2020-02-14 11:20:02 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/statsExpressions)](http://www.rdocumentation.org/packages/statsExpressions)