ggbetweenstats
Box/Violin plots for group or condition comparisons in between-subjects designs.
A combination of box and violin plots along with jittered data points for between-subjects designs with statistical details included in the plot as a subtitle.
Usage
ggbetweenstats(
data,
x,
y,
plot.type = "boxviolin",
type = "parametric",
pairwise.comparisons = TRUE,
pairwise.display = "significant",
p.adjust.method = "holm",
effsize.type = "unbiased",
bf.prior = 0.707,
bf.message = TRUE,
results.subtitle = TRUE,
xlab = NULL,
ylab = NULL,
caption = NULL,
title = NULL,
subtitle = NULL,
sample.size.label = TRUE,
k = 2L,
var.equal = FALSE,
conf.level = 0.95,
nboot = 100L,
tr = 0.1,
mean.plotting = TRUE,
mean.ci = FALSE,
mean.point.args = list(size = 5, color = "darkred"),
mean.label.args = list(size = 3),
notch = FALSE,
notchwidth = 0.5,
outlier.tagging = FALSE,
outlier.label = NULL,
outlier.coef = 1.5,
outlier.shape = 19,
outlier.color = "black",
outlier.label.args = list(size = 3),
point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6), alpha
= 0.4, size = 3, stroke = 0),
violin.args = list(width = 0.5, alpha = 0.2),
ggsignif.args = list(textsize = 3, tip_length = 0.01),
ggtheme = ggplot2::theme_bw(),
ggstatsplot.layer = TRUE,
package = "RColorBrewer",
palette = "Dark2",
ggplot.component = NULL,
output = "plot",
...
)
Arguments
- data
A dataframe (or a tibble) from which variables specified are to be taken. A matrix or tables will not be accepted.
- x
The grouping variable from the dataframe
data
.- y
The response (a.k.a. outcome or dependent) variable from the dataframe
data
.- plot.type
Character describing the type of plot. Currently supported plots are
"box"
(for pure boxplots),"violin"
(for pure violin plots), and"boxviolin"
(for a combination of box and violin plots; default).- type
Type of statistic expected (
"parametric"
or"nonparametric"
or"robust"
or"bayes"
).Corresponding abbreviations are also accepted:"p"
(for parametric),"np"
(nonparametric),"r"
(robust), or"bf"
resp.- pairwise.comparisons
Logical that decides whether pairwise comparisons are to be displayed (default:
TRUE
). Please note that only significant comparisons will be shown by default. To change this behavior, select appropriate option withpairwise.display
argument. The pairwise comparison dataframes are prepared using thepairwiseComparisons::pairwise_comparisons
function. For more details about pairwise comparisons, see the documentation for that function.- pairwise.display
Decides which pairwise comparisons to display. Available options are
"significant"
(abbreviation accepted:"s"
) or"non-significant"
(abbreviation accepted:"ns"
) or"everything"
/"all"
. The default is"significant"
. You can use this argument to make sure that your plot is not uber-cluttered when you have multiple groups being compared and scores of pairwise comparisons being displayed.- p.adjust.method
Adjustment method for p-values for multiple comparisons. Possible methods are:
"holm"
(default),"hochberg"
,"hommel"
,"bonferroni"
,"BH"
,"BY"
,"fdr"
,"none"
.- effsize.type
Type of effect size needed for parametric tests. The argument can be
"eta"
(partial eta-squared) or"omega"
(partial omega-squared).- bf.prior
A number between
0.5
and2
(default0.707
), the prior width to use in calculating Bayes factors.- bf.message
Logical that decides whether to display Bayes Factor in favor of the null hypothesis. This argument is relevant only for parametric test (Default:
TRUE
).- results.subtitle
Decides whether the results of statistical tests are to be displayed as a subtitle (Default:
TRUE
). If set toFALSE
, only the plot will be returned.- xlab, ylab
Labels for
x
andy
axis variables. IfNULL
(default), variable names forx
andy
will be used.- caption
The text for the plot caption.
- title
The text for the plot title.
- subtitle
The text for the plot subtitle. Will work only if
results.subtitle = FALSE
.- sample.size.label
Logical that decides whether sample size information should be displayed for each level of the grouping variable
x
(Default:TRUE
).- k
Number of digits after decimal point (should be an integer) (Default:
k = 2L
).- var.equal
a logical variable indicating whether to treat the variances in the samples as equal. If
TRUE
, then a simple F test for the equality of means in a one-way analysis of variance is performed. IfFALSE
, an approximate method of Welch (1951) is used, which generalizes the commonly known 2-sample Welch test to the case of arbitrarily many samples.- conf.level
Scalar between 0 and 1. If unspecified, the defaults return
95%
confidence/credible intervals (0.95
).- nboot
Number of bootstrap samples for computing confidence interval for the effect size (Default:
100
).- tr
Trim level for the mean when carrying out
robust
tests. If you get error stating "Standard error cannot be computed because of Winsorized variance of 0 (e.g., due to ties). Try to decrease the trimming level.", try to play around with the value oftr
, which is by default set to0.1
. Lowering the value might help.- mean.plotting
Logical that decides whether mean is to be highlighted and its value to be displayed (Default:
TRUE
).- mean.ci
Logical that decides whether
95%
confidence interval for mean is to be displayed (Default:FALSE
).- mean.point.args, mean.label.args
A list of additional aesthetic arguments to be passed to
ggplot2::geom_point
andggrepel::geom_label_repel
geoms involved mean value plotting.- notch
A logical. If
FALSE
(default), a standard box plot will be displayed. IfTRUE
, a notched box plot will be used. Notches are used to compare groups; if the notches of two boxes do not overlap, this suggests that the medians are significantly different. In a notched box plot, the notches extend1.58 * IQR / sqrt(n)
, where IQR: Inter-Quartile Range. This gives a roughly95%
confidence interval for comparing medians.- notchwidth
For a notched box plot, width of the notch relative to the body (default
0.5
).- outlier.tagging
Decides whether outliers should be tagged (Default:
FALSE
).- outlier.label
Label to put on the outliers that have been tagged. This can't be the same as
x
argument.- outlier.coef
Coefficient for outlier detection using Tukey's method. With Tukey's method, outliers are below (1st Quartile) or above (3rd Quartile)
outlier.coef
times the Inter-Quartile Range (IQR) (Default:1.5
).- outlier.shape
Hiding the outliers can be achieved by setting
outlier.shape = NA
. Importantly, this does not remove the outliers, it only hides them, so the range calculated for they
-axis will be the same with outliers shown and outliers hidden.- outlier.color
Default aesthetics for outliers (Default:
"black"
).- outlier.label.args
A list of additional aesthetic arguments to be passed to
ggrepel::geom_label_repel
for outlier label plotting.- point.args
A list of additional aesthetic arguments to be passed to the
geom_point
displaying the raw data.- violin.args
A list of additional aesthetic arguments to be passed to the
geom_violin
.- ggsignif.args
A list of additional aesthetic arguments to be passed to
ggsignif::geom_signif
.- ggtheme
A function,
ggplot2
theme name. Default value isggplot2::theme_bw()
. Any of theggplot2
themes, or themes from extension packages are allowed (e.g.,ggthemes::theme_fivethirtyeight()
,hrbrthemes::theme_ipsum_ps()
, etc.).- ggstatsplot.layer
Logical that decides whether
theme_ggstatsplot
theme elements are to be displayed along with the selectedggtheme
(Default:TRUE
).theme_ggstatsplot
is an opinionated theme layer that override some aspects of the selectedggtheme
.- package, palette
Name of the package from which the given palette is to be extracted. The available palettes and packages can be checked by running
View(paletteer::palettes_d_names)
.- ggplot.component
A
ggplot
component to be added to the plot prepared byggstatsplot
. This argument is primarily helpful forgrouped_
variant of the current function. Default isNULL
. The argument should be entered as a function.- output
Character that describes what is to be returned: can be
"plot"
(default) or"subtitle"
or"caption"
. Setting this to"subtitle"
will return the expression containing statistical results. If you have setresults.subtitle = FALSE
, then this will return aNULL
. Setting this to"caption"
will return the expression containing details about Bayes Factor analysis, but valid only whentype = "parametric"
andbf.message = TRUE
, otherwise this will return aNULL
.- ...
Currently ignored.
References
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html
See Also
grouped_ggbetweenstats
, ggwithinstats
,
grouped_ggwithinstats
Examples
# NOT RUN {
# to get reproducible results from bootstrapping
set.seed(123)
library(ggstatsplot)
# simple function call with the defaults
ggstatsplot::ggbetweenstats(
data = mtcars,
x = am,
y = mpg,
title = "Fuel efficiency by type of car transmission",
caption = "Transmission (0 = automatic, 1 = manual)"
)
# more detailed function call
ggstatsplot::ggbetweenstats(
data = datasets::morley,
x = Expt,
y = Speed,
type = "nonparametric",
plot.type = "box",
xlab = "The experiment number",
ylab = "Speed-of-light measurement",
pairwise.comparisons = TRUE,
p.adjust.method = "fdr",
outlier.tagging = TRUE,
outlier.label = Run,
ggtheme = ggplot2::theme_grey(),
ggstatsplot.layer = FALSE
)
# }