interact
calculates test statistics for assessing the strength of
interactions between a set of user-specified input variable(s), and all
other input variables.
interact(
object,
varnames = NULL,
penalty.par.val = "lambda.1se",
gamma = NULL,
nullmods = NULL,
quantprobs = c(0.05, 0.95),
plot = TRUE,
col = c("darkgrey", "lightgrey"),
ylab = "Interaction strength",
main = "Interaction test statistics",
se.linewidth = 0.05,
legend.text = c("observed", "null model median"),
parallel = FALSE,
k = 10,
verbose = FALSE,
...
)
Function interact()
returns and plots interaction statistics
for the specified predictor variables. If nullmods is not specified, it
returns and plots only the interaction test statistics for the specified
fitted prediction rule ensemble. If nullmods is specified, the function
returns a list, with elements $fittedH2
, containing the interaction
statistics of the fitted ensemble, and $nullH2
, which contains the
interaction test statistics for each of the bootstrapped null interaction
models.
If plot = TRUE
(the default), a barplot is created with the
interaction test statistic from the fitted prediction rule ensemble. If
nullmods
is specified, bars representing the median of the
distribution of interaction test statistics of the bootstrapped null
interaction models are plotted. In addition, error bars representing the
quantiles of the distribution (their value specified by the quantprobs
argument) are plotted. These allow for testing the null hypothesis of no interaction effect for each of the input variables.
Note that the error rates of null hypothesis tests of interaction effects
have not yet been studied in detail, but results are likely to get more
reliable when the number of bootstrapped null interaction models is larger.
The default of the bsnullinteract
function is to generate 10
bootstrapped null interaction datasets, to yield shorter computation times.
To obtain a more reliable result, however, users are advised to
set the nsamp
argument \(\ge 100\).
See also section 8 of Friedman & Popescu (2008).
an object of class pre
.
character vector. Names of variables for which interaction
statistics should be calculated. If NULL
, interaction statistics for
all predictor variables with non-zeor coefficients will be calculated (which
may take a long time).
character or numeric. Value of the penalty parameter
\(\lambda\) to be employed for selecting the final ensemble. The default
"lambda.min"
employs the \(\lambda\) value within 1 standard
error of the minimum cross-validated error. Alternatively,
"lambda.min"
may be specified, to employ the \(\lambda\) value
with minimum cross-validated error, or a numeric value \(>0\) may be
specified, with higher values yielding a sparser ensemble. To evaluate the
trade-off between accuracy and sparsity of the final ensemble, inspect
pre_object$glmnet.fit
and plot(pre_object$glmnet.fit)
.
Mixing parameter for relaxed fits. See
coef.cv.glmnet
.
object with bootstrapped null interaction models, resulting
from application of bsnullinteract
.
numeric vector of length two. Probabilities that should be
used for plotting the range of bootstrapped null interaction model statistics.
Only used when nullmods
argument is specified and plot = TRUE
.
The default yields sample quantiles corresponding to .05 and .95 probabilities.
logical. Should interaction statistics be plotted?
character vector of length one or two. The first value specifies
the color to be used for plotting the interaction statistic from the training
data, the second color is used for plotting the interaction statistic from
the bootstrapped null interaction models. Only used when plot = TRUE
.
Only the first element will be used if nullmods = NULL
.
character string. Label to be used for plotting y-axis.
character. Main title for the bar plot.
numeric. Width of the whiskers of the plotted standard error bars (in inches).
character vector of length two to be used for plotting
the legend. Only used when nullmods
is specified. If FALSE
,
no legend is plotted.
logical. Should parallel foreach be used? Must register parallel beforehand, such as doMC or others.
integer. Calculating interaction test statistics is computationally intensive, so calculations are split up in several parts to prevent memory allocation errors. If a memory allocation error still occurs, increase k.
logical. Should progress information be printed to the command line?
Further arguments to be passed to barplot
.
Can be computationally intensive, especially when nullmods is
specified, in which case setting parallel = TRUE
may improve speed.
Fokkema, M. (2020). Fitting prediction rule ensembles with R package pre. Journal of Statistical Software, 92(12), 1-30. tools:::Rd_expr_doi("10.18637/jss.v092.i12")
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954, tools:::Rd_expr_doi("10.1214/07-AOAS148").
pre
, bsnullinteract
set.seed(42)
airq.ens <- pre(Ozone ~ ., data=airquality[complete.cases(airquality),])
interact(airq.ens, c("Temp", "Wind", "Solar.R"))
Run the code above in your browser using DataLab