visStatistics

Visualization of a statistical hypothesis test selected to be valid under the data???s type, distribution, sample size, and confidence level

visStatistics is an R package for rapid visualization and statistical analysis of raw data. It automatically selects and applies a hypothesis test that is valid for evaluating the relationship between a response (varsample) and a feature (varfactor) within a data.frame.

A minimal function call looks of its main function visstat() looks like:

visstat(dataframe, varsample = "response", varfactor = "feature")

The input must be a column-based data.frame, and varsample and varfactor are character strings naming columns of that data frame.

The function selects a statistical test based on the class of the response and feature variables, the number of levels in categorical variables, and assumptions such as normality and homoscedasticity as well as the chosen 'conf.level'.

The automatically generated output figures illustrate the selected statistical test, display the main test statistics, and include assumption checks and post hoc comparisons when applicable. The primary test results are returned as a list object.

This automated workflow is particularly suited for integration into browser-based interfaces or server-side R applications that interact with databases.

For a detailed description of the decision logic see

vignette("visStatistics")

Installation of latest stable version from CRAN

Install the package

install.packages("visStatistics")

Load the package

library(visStatistics)

Installation of the developing version from GitHub

Install devtools from CRAN if not already installed

install.packages("devtools")

Load devtools

library(devtools)

Install the visStatistics package from GitHub

install_github("shhschilling/visStatistics")

Load the package

library(visStatistics)

View help

?visstat

Examples

library(visStatistics)

Welch???s t-test

InsectSprays data set

insect_sprays_a_b <- 
  InsectSprays[which(InsectSprays$spray == "A" | InsectSprays$spray == "B"), ]
insect_sprays_a_b$spray <- factor(insect_sprays_a_b$spray)
visstat(insect_sprays_a_b, "count", "spray")

mtcars data set

mtcars$am <- as.factor(mtcars$am)
t_test_statistics <- visstat(mtcars, "mpg", "am")

Wilcoxon rank sum test

grades_gender <- data.frame(
  sex = as.factor(c(rep("girl", 21), rep("boy", 23))),
  grade = c(
    19.3, 18.1, 15.2, 18.3, 7.9, 6.2, 19.4,
    20.3, 9.3, 11.3, 18.2, 17.5, 10.2, 20.1, 13.3, 17.2, 15.1, 16.2, 17.0,
    16.5, 5.1, 15.3, 17.1, 14.8, 15.4, 14.4, 7.5, 15.5, 6.0, 17.4,7.3, 14.3, 
    13.5, 8.0, 19.5, 13.4, 17.9, 17.7, 16.4, 15.6, 17.3, 19.9, 4.4, 2.1
  )
)

wilcoxon_statistics <- visstat(grades_gender, "grade", "sex")

ANOVA

insect_sprays_tr <- InsectSprays
insect_sprays_tr$count_sqrt <- sqrt(InsectSprays$count)
visstat(insect_sprays_tr, "count_sqrt", "spray")

One-way test

one_way_npk <- visstat(npk, "yield", "block")

Kruskal-Wallis test

The generated graphs can be saved in all available formats of the Cairo package. Here we save the graphical output of type ???pdf??? in the plotDirectory tempdir():

visstat(iris, "Petal.Width", "Species", 
        graphicsoutput = "pdf", plotDirectory = tempdir())

Linear Regression

linreg_cars <- visstat(cars, "dist", "speed")

Increasing the confidence level conf.level from the default 0.95 to 0.99 leads two wider confidence and prediction bands:

Pearson???s Chi-squared test

Count data sets are often presented as multidimensional arrays, so-called contingency tables, whereas visstat() requires a data.frame with a column structure. Arrays can be transformed to this column wise structure with the helper function counts_to_cases():

hair_eye_color_df <- counts_to_cases(as.data.frame(HairEyeColor))
visstat(hair_eye_color_df, "Hair", "Eye")

Fisher???s exact test

hair_eye_color_male <- HairEyeColor[, , 1]
# Slice out a 2 by 2 contingency table
black_brown_hazel_green_male <- hair_eye_color_male[1:2, 3:4]
#Transform to data frame
black_brown_hazel_green_male <- counts_to_cases(as.data.frame(black_brown_hazel_green_male))
# Fisher test
fisher_stats <- visstat(black_brown_hazel_green_male, "Hair", "Eye")

Implemented tests

Data of class "numeric" or "integer" are referred to as numerical, while data of class "factor" are referred to as categorical.

Numerical response ~ categorical feature

When the response is numerical and the feature is categorical, test of central tendencies are selected:

t.test(), wilcox.test(), aov(), oneway.test(),kruskal.test()

Normality assumption check

shapiro.test() and ad.test()

Homoscedactiy assumption check

bartlett.test()

Post-hoc tests

TukeyHSD() (for aov()and oneway.test())
pairwise.wilcox.test() (for kruskal.test())

The decision below tree summarizes the underlying decision logic for tests of central tendencies.

knitr::include_graphics("man/figures/decision_tree.png")

Numerical response ~ numerical feature

When both the response and feature are numerical, a simple linear regression model is fitted:

lm()

Categorical response ~ categorical predictor

When both variables are categorical, visstat() tests the null hypothesis of independence using one of the following:

chisq.test() (default for larger samples)
fisher.test() (used for small expected cell counts based on Cochran???s rule)

visStatistics

Installation of latest stable version from CRAN

Installation of the developing version from GitHub

Examples

Welch???s t-test

InsectSprays data set

mtcars data set

Wilcoxon rank sum test

ANOVA

One-way test

Kruskal-Wallis test

Linear Regression

Pearson???s Chi-squared test

Fisher???s exact test

Implemented tests

Numerical response ~ categorical feature

Normality assumption check

Homoscedactiy assumption check

Post-hoc tests

Numerical response ~ numerical feature

Categorical response ~ categorical predictor

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Homepage

Maintainer

Last Published

Functions in visStatistics (0.1.3)