Custom function to get confidence intervals for effect size measure for parametric or non-parametric correlation coefficient.
cor_test_ci(data, x, y, method = "spearman", exact = FALSE,
continuity = TRUE, alternative = "two.sided", nboot = 100,
conf.level = 0.95, conf.type = "norm", ...)
an optional matrix or data frame (or similar: see
model.frame
) containing the variables in the
formula formula
. By default the variables are taken from
environment(formula)
.
A vector containing the explanatory variable.
The response - a vector of length the number of rows of x
.
a character string indicating which correlation
coefficient is to be used for the test. One of "pearson"
,
"kendall"
, or "spearman"
, can be abbreviated.
a logical indicating whether an exact p-value should be
computed. Used for Kendall's \(\tau\) and
Spearman's \(\rho\).
See ‘Details’ for the meaning of NULL
(the default).
logical: if true, a continuity correction is used for Kendall's \(\tau\) and Spearman's \(\rho\) when not computed exactly.
indicates the alternative hypothesis and must be
one of "two.sided"
, "greater"
or "less"
. You
can specify just the initial letter. "greater"
corresponds
to positive association, "less"
to negative association.
Number of bootstrap samples for computing confidence interval
for the effect size (Default: 100
).
confidence level for the returned confidence interval. Currently only used for the Pearson product moment correlation coefficient if there are at least 4 complete pairs of observations.
A vector of character strings representing the type of
intervals required. The value should be any subset of the values "norm"
,
"basic"
, "perc"
, "bca"
. For more, see ?boot::boot.ci
.
further arguments to be passed to or from methods.
Arguments passed on to boot::boot
The data as a vector, matrix or data frame. If it is a matrix or data frame then each row is considered as one multivariate observation.
A function which when applied to data returns a vector containing
the statistic(s) of interest. When sim = "parametric"
, the
first argument to statistic
must be the data. For each
replicate a simulated dataset returned by ran.gen
will be
passed. In all other cases statistic
must take at least two
arguments. The first argument passed will always be the original
data. The second will be a vector of indices, frequencies or weights
which define the bootstrap sample. Further, if predictions are
required, then a third argument is required which would be a vector
of the random indices used to generate the bootstrap predictions.
Any further arguments can be passed to statistic
through the
…
argument.
The number of bootstrap replicates. Usually this will be a single
positive integer. For importance resampling, some resamples may use
one set of weights and others use a different set of weights. In
this case R
would be a vector of integers where each
component gives the number of resamples from each of the rows of
weights.
A character string indicating the type of simulation required.
Possible values are "ordinary"
(the default),
"parametric"
, "balanced"
, "permutation"
, or
"antithetic"
. Importance resampling is specified by
including importance weights; the type of importance resampling must
still be specified but may only be "ordinary"
or
"balanced"
in this case.
A character string indicating what the second argument of statistic
represents. Possible values of stype are "i"
(indices - the
default), "f"
(frequencies), or "w"
(weights). Not
used for sim = "parametric"
.
An integer vector or factor specifying the strata for multi-sample
problems. This may be specified for any simulation, but is ignored
when sim = "parametric"
. When strata
is
supplied for a nonparametric bootstrap, the simulations are done
within the specified strata.
Vector of influence values evaluated at the observations. This is
used only when sim
is "antithetic"
. If not supplied,
they are calculated through a call to empinf
. This will use
the infinitesimal jackknife provided that stype
is
"w"
, otherwise the usual jackknife is used.
The number of predictions which are to be made at each bootstrap
replicate. This is most useful for (generalized) linear models.
This can only be used when sim
is "ordinary"
.
m
will usually be a single integer but, if there are strata,
it may be a vector with length equal to the number of strata,
specifying how many of the errors for prediction should come from
each strata. The actual predictions should be returned as the final
part of the output of statistic
, which should also take an
argument giving the vector of indices of the errors to be used for
the predictions.
Vector or matrix of importance weights. If a vector then it should
have as many elements as there are observations in data
.
When simulation from more than one set of weights is required,
weights
should be a matrix where each row of the matrix is
one set of importance weights. If weights
is a matrix then
R
must be a vector of length nrow(weights)
. This
parameter is ignored if sim
is not "ordinary"
or
"balanced"
.
This function is used only when sim = "parametric"
when it describes how random values are to be generated. It should
be a function of two arguments. The first argument should be the
observed data and the second argument consists of any other
information needed (e.g. parameter estimates). The second argument
may be a list, allowing any number of items to be passed to
ran.gen
. The returned value should be a simulated data set
of the same form as the observed data which will be passed to
statistic
to get a bootstrap replicate. It is important that the
returned value be of the same shape and type as the original
dataset. If ran.gen
is not specified, the default is a
function which returns the original data
in which case all
simulation should be included as part of statistic
. Use of
sim = "parametric"
with a suitable ran.gen
allows the
user to implement any types of nonparametric resampling which are
not supported directly.
The second argument to be passed to ran.gen
. Typically these
will be maximum likelihood estimates of the parameters. For
efficiency mle
is often a list containing all of the objects
needed by ran.gen
which can be calculated using the original
data set only.
logical, only allowed to be TRUE
for
sim = "ordinary", stype = "i", n = 0
(otherwise ignored with a
warning). By default a n
by R
index array is created:
this can be large and if simple = TRUE
this is avoided by
sampling separately for each replication, which is slower but uses
less memory.
The type of parallel operation to be used (if any). If missing, the
default is taken from the option "boot.parallel"
(and if that
is not set, "no"
).
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.
An optional parallel or snow cluster for use if
parallel = "snow"
. If not supplied, a cluster on the
local machine is created for the duration of the boot
call.
# NOT RUN {
ggstatsplot:::cor_test_ci(
data = ggplot2::msleep,
x = brainwt,
y = sleep_total,
nboot = 25,
conf.level = 0.99,
conf.type = "perc",
method = "spearman",
continuity = TRUE,
alternative = "greater"
)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab