interact: Calculate interaction statistics for variables in a prediction rule ensemble (pre)

Description

interact calculates test statistics for assessing the strength of interactions between a set of user-specified input variable(s), and all other input variables.

Usage

interact(object, varnames = NULL, nullmods = NULL,
  penalty.par.val = "lambda.1se", quantprobs = c(0.05, 0.95),
  plot = TRUE, col = c("darkgrey", "lightgrey"),
  ylab = "Interaction strength", main = "Interaction test statistics",
  se.linewidth = 0.05, legend.text = c("observed",
  "null model median"), parallel = FALSE, k = 10, verbose = FALSE,
  ...)

Arguments

object

an object of class pre.

varnames

character vector. Names of variables for which interaction statistics should be calculated. If NULL, interaction statistics for all predictor variables with non-zeor coefficients will be calculated (which may take a long time).

nullmods

object with bootstrapped null interaction models, resulting from application of bsnullinteract.

penalty.par.val

character or numeric. Value of the penalty parameter $\lambda$ to be employed for selecting the final ensemble. The default "lambda.min" employs the $\lambda$ value within 1 standard error of the minimum cross-validated error. Alternatively, "lambda.min" may be specified, to employ the $\lambda$ value with minimum cross-validated error, or a numeric value $>0$ may be specified, with higher values yielding a sparser ensemble. To evaluate the trade-off between accuracy and sparsity of the final ensemble, inspect pre_object$glmnet.fit and plot(pre_object$glmnet.fit).

quantprobs

numeric vector of length two. Probabilities that should be used for plotting the range of bootstrapped null interaction model statistics. Only used when nullmods argument is specified and plot = TRUE. The default yields sample quantiles corresponding to .05 and .95 probabilities.

plot

logical. Should interaction statistics be plotted?

col

character vector of length one or two. The first value specifies the color to be used for plotting the interaction statistic from the training data, the second color is used for plotting the interaction statistic from the bootstrapped null interaction models. Only used when plot = TRUE. Only the first element will be used if nullmods = NULL.

ylab

character string. Label to be used for plotting y-axis.

main

character. Main title for the bar plot.

se.linewidth

numeric. Width of the whiskers of the plotted standard error bars (in inches).

legend.text

character vector of length two to be used for plotting the legend. Only used when nullmods is specified. If FALSE, no legend is plotted.

parallel

logical. Should parallel foreach be used? Must register parallel beforehand, such as doMC or others.

integer. Calculating interaction test statistics is computationally intensive, so calculations are split up in several parts to prevent memory allocation errors. If a memory allocation error still occurs, increase k.

verbose

logical. Should progress information be printed to the command line?

...

Additional arguments to be passed to barplot.

Value

Function interact() returns and plots interaction statistics for the specified predictor variables. If nullmods is not specified, it returns and plots only the interaction test statistics for the specified fitted prediction rule ensemble. If nullmods is specified, the function returns a list, with elements $fittedH2, containing the interaction statistics of the fitted ensemble, and $nullH2, which contains the interaction test statistics for each of the bootstrapped null interaction models.

If plot = TRUE (the default), a barplot is created with the interaction test statistic from the fitted prediction rule ensemble. If nullmods is specified, bars representing the median of the distribution of interaction test statistics of the bootstrapped null interaction models are plotted. In addition, error bars representing the quantiles of the distribution (their value specified by the quantprobs argument) are plotted. These allow for testing the null hypothesis of no interaction effect for each of the input variables.

Note that the error rates of null hypothesis tests of interaction effects have not yet been studied in detail, but results are likely to get more reliable when the number of bootstrapped null interaction models is larger. The default of the bsnullinteract function is to generate 10 bootstrapped null interaction datasets, to yield shorter computation times. To obtain a more reliable result, however, users are advised to set the nsamp argument $\ge 100$.

See also section 8 of Friedman & Popescu (2008).

Details

Can be computationally intensive, especially when nullmods is specified, in which case setting parallel = TRUE may improve speed.

References

Fokkema, M. (2018). Fitting prediction rule ensembles with R package pre. https://arxiv.org/abs/1707.07149.

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

Examples

Run this code

# NOT RUN {
set.seed(42)
 airq.ens <- pre(Ozone ~ ., data=airquality[complete.cases(airquality),])
 interact(airq.ens, c("Temp", "Wind", "Solar.R"))
# }

Run the code above in your browser using DataLab