ggscatterstats
Scatterplot with marginal distributions
Scatterplots from ggplot2
combined with marginal
histograms/boxplots/density plots with statistical details added as a
subtitle.
Usage
ggscatterstats(data, x, y, type = "pearson", conf.level = 0.95,
bf.prior = 0.707, bf.message = TRUE, label.var = NULL,
label.expression = NULL, xlab = NULL, ylab = NULL, method = "lm",
method.args = list(), formula = y ~ x, point.color = "black",
point.size = 3, point.alpha = 0.4, point.width.jitter = 0,
point.height.jitter = 0, line.size = 1.5, line.color = "blue",
marginal = TRUE, marginal.type = "histogram", marginal.size = 5,
margins = c("both", "x", "y"), package = "wesanderson",
palette = "Royal1", direction = 1, xfill = "#009E73",
yfill = "#D55E00", xalpha = 1, yalpha = 1, xsize = 0.7,
ysize = 0.7, centrality.para = NULL, results.subtitle = TRUE,
stat.title = NULL, title = NULL, subtitle = NULL, caption = NULL,
nboot = 100, beta = 0.1, k = 2, axes.range.restrict = FALSE,
ggtheme = ggplot2::theme_bw(), ggstatsplot.layer = TRUE,
ggplot.component = NULL, return = "plot", messages = TRUE)
Arguments
- data
A dataframe (or a tibble) from which variables specified are to be taken. A matrix or tables will not be accepted.
- x
The column in
data
containing the explanatory variable to be plotted on the x axis. Can be entered either as a character string (e.g.,"x"
) or as a bare expression (e.g,x
).- y
The column in
data
containing the response (outcome) variable to be plotted on the y axis. Can be entered either as a character string (e.g.,"y"
) or as a bare expression (e.g,y
).- type
Type of association between paired samples required ("
"parametric"
: Pearson's product moment correlation coefficient" or ""nonparametric"
: Spearman's rho" or ""robust"
: percentage bend correlation coefficient" or ""bayes"
: Bayes Factor for Pearson's r"). Corresponding abbreviations are also accepted:"p"
(for parametric/pearson's),"np"
(nonparametric/spearman),"r"
(robust),"bf"
(for bayes factor), resp.- conf.level
Scalar between 0 and 1. If unspecified, the defaults return
95%
lower and upper confidence intervals (0.95
).- bf.prior
A number between 0.5 and 2 (default
0.707
), the prior width to use in calculating Bayes factors.- bf.message
Logical that decides whether to display Bayes Factor in favor of the null hypothesis. This argument is relevant only for parametric test (Default:
TRUE
).- label.var
Variable to use for points labels. Can be entered either as a character string (e.g.,
"var1"
) or as a bare expression (e.g,var1
).- label.expression
An expression evaluating to a logical vector that determines the subset of data points to label. This argument can be entered either as a character string (e.g.,
"y < 4 & z < 20"
) or as a bare expression (e.g.,y < 4 & z < 20
).- xlab
Labels for
x
andy
axis variables. IfNULL
(default), variable names forx
andy
will be used.- ylab
Labels for
x
andy
axis variables. IfNULL
(default), variable names forx
andy
will be used.- method
Smoothing method (function) to use, accepts either a character vector, e.g.
"auto"
,"lm"
,"glm"
,"gam"
,"loess"
or a function, e.g.MASS::rlm
ormgcv::gam
,stats::lm
, orstats::loess
.For
method = "auto"
the smoothing method is chosen based on the size of the largest group (across all panels).loess()
is used for less than 1,000 observations; otherwisemgcv::gam()
is used withformula = y ~ s(x, bs = "cs")
. Somewhat anecdotally,loess
gives a better appearance, but is \(O(N^{2})\) in memory, so does not work for larger datasets.If you have fewer than 1,000 observations but want to use the same
gam()
model thatmethod = "auto"
would use, then setmethod = "gam", formula = y ~ s(x, bs = "cs")
.- method.args
List of additional arguments passed on to the modelling function defined by
method
.- formula
Formula to use in smoothing function, eg.
y ~ x
,y ~ poly(x, 2)
,y ~ log(x)
- point.color, point.size, point.alpha
Aesthetics specifying geom point (defaults:
point.color = "black"
,point.size = 3
,point.alpha = 0.4
).- point.width.jitter, point.height.jitter
Degree of jitter in
x
andy
direction, respectively. Defaults to0
(0 data.- line.size
Size for the regression line.
- line.color
color for the regression line.
- marginal
Decides whether
ggExtra::ggMarginal()
plots will be displayed; the default isTRUE
.- marginal.type
Type of marginal distribution to be plotted on the axes (
"histogram"
,"boxplot"
,"density"
,"violin"
,"densigram"
).- marginal.size
Integer describing the relative size of the marginal plots compared to the main plot. A size of
5
means that the main plot is 5x wider and 5x taller than the marginal plots.- margins
Character describing along which margins to show the plots. Any of the following arguments are accepted:
"both"
,"x"
,"y"
.- package
Name of package from which the palette is desired as string or symbol.
- palette
Name of palette as string or symbol.
- direction
Either
1
or-1
. If-1
the palette will be reversed.- xfill, yfill
Character describing color fill for
x
andy
axes marginal distributions (default:"#009E73"
(forx
) and"#D55E00"
(fory
)). If set toNULL
, manual specification of colors will be turned off and 2 colors from the specifiedpalette
frompackage
will be selected.- xalpha, yalpha
Numeric deciding transparency levels for the marginal distributions. Any numbers from
0
(transparent) to1
(opaque). The default is1
for both axes.- xsize, ysize
Size for the marginal distribution boundaries (Default:
0.7
).- centrality.para
Decides which measure of central tendency (
"mean"
or"median"
) is to be displayed as vertical (forx
) and horizontal (fory
) lines.- results.subtitle
Decides whether the results of statistical tests are to be displayed as a subtitle (Default:
TRUE
). If set toFALSE
, only the plot will be returned.- stat.title
A character describing the test being run, which will be added as a prefix in the subtitle. The default is
NULL
. An example of astat.title
argument will be something like"Student's t-test: "
.- title
The text for the plot title.
- subtitle
The text for the plot subtitle. Will work only if
results.subtitle = FALSE
.- caption
The text for the plot caption.
- nboot
Number of bootstrap samples for computing confidence interval for the effect size (Default:
100
).- beta
bending constant (Default:
0.1
). For more, see?WRS2::pbcor
.- k
Number of digits after decimal point (should be an integer) (Default:
k = 2
).- axes.range.restrict
Logical that decides whether to restrict the axes values ranges to
min
andmax
values of the axes variables (Default:FALSE
), only relevant for functions where axes variables are of numeric type.- ggtheme
A function,
ggplot2
theme name. Default value isggplot2::theme_bw()
. Any of theggplot2
themes, or themes from extension packages are allowed (e.g.,ggthemes::theme_fivethirtyeight()
,hrbrthemes::theme_ipsum_ps()
, etc.).- ggstatsplot.layer
Logical that decides whether
theme_ggstatsplot
theme elements are to be displayed along with the selectedggtheme
(Default:TRUE
).- ggplot.component
A
ggplot
component to be added to the plot prepared byggstatsplot
. This argument is primarily helpful forgrouped_
variant of the current function. Default isNULL
. The argument should be entered as a function. If the given function has an argumentaxes.range.restrict
and if it has been set toTRUE
, the added ggplot component might not work as expected.- return
Character that describes what is to be returned: can be
"plot"
(default) or"subtitle"
or"caption"
. Setting this to"subtitle"
will return the expression containing statistical results, which will be aNULL
if you setresults.subtitle = FALSE
. Setting this to"caption"
will return the expression containing details about Bayes Factor analysis, but valid only whentype = "p"
andbf.message = TRUE
, otherwise this will return aNULL
.- messages
Decides whether messages references, notes, and warnings are to be displayed (Default:
TRUE
).
Note
marginal.type = "densigram"
will work only with the development version ofggExtra
that you can download fromGitHub
:remotes::install_github("daattali/ggExtra")
.The plot uses
ggrepel::geom_label_repel
to attempt to keep labels from over-lapping to the largest degree possible. As a consequence plot times will slow down massively (and the plot file will grow in size) if you have a lot of labels that overlap.
References
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html
See Also
Examples
# NOT RUN {
# to get reproducible results from bootstrapping
set.seed(123)
# creating dataframe with rownames converted to a new column
mtcars_new <- mtcars %>%
tibble::rownames_to_column(., var = "car") %>%
tibble::as_tibble(x = .)
# simple function call with the defaults
ggstatsplot::ggscatterstats(
data = mtcars_new,
x = wt,
y = mpg,
type = "np",
label.var = car,
label.expression = wt < 4 & mpg < 20,
axes.range.restrict = TRUE,
centrality.para = "median",
xfill = NULL
)
# }