interact: Calculate interaction statistics for user-specified variables

Description

interact calculates test statistics for assessing the strength of interactions between the input variable(s) specified, and all other input variables.

Usage

interact(object, varnames = NULL, nullmods = NULL,
  penalty.par.val = "lambda.1se", quantprobs = c(0.05, 0.95), plot = TRUE,
  col = c("yellow", "blue"), ylab = "Interaction strength",
  main = "Interaction test statistics", se.linewidth = 0.05,
  parallel = FALSE, k = 10, verbose = FALSE, ...)

Arguments

object

an object of class pre.

varnames

character vector. Names of variables for which interaction statistics should be calculated. If NULL, interaction statistics for all predictor variables with non-zeor coefficients will be calculated (which may take a long time).

nullmods

object with bootstrapped null interaction models, resulting from application of bsnullinteract.

penalty.par.val

character. Which value of the penalty parameter criterion should be used? The value yielding minimum cv error ("lambda.min") or penalty parameter yielding error within 1 standard error of minimum cv error ("lambda.1se")? Alternatively, a numeric value may be specified, corresponding to one of the values of lambda in the sequence used by glmnet, for which estimated cv error can be inspected by running object$glmnet.fit and plot(object$glmnet.fit).

quantprobs

numeric vector of length two. Probabilities that should be used for plotting the range of bootstrapped null interaction model statistics. Only used when nullmods argument is specified and plot = TRUE. The default yields sample quantiles corresponding to .05 and .95 probabilities.

plot

logical. Should interaction statistics be plotted?

col

character vector of length one or two. Color for plotting interaction statistics. The first color specified is used to plot the interaction statistic from the training data, the second color specifed is used to plot the interaction statistic distribution from the bootstrapped null interaction models. Only used when plot = TRUE. Only the first element of vector is used if nullmods = NULL.

ylab

character string. Label to be used for plotting y-axis.

main

character. Main title for the bar plot.

se.linewidth

numeric. Width of the whiskers of the plotted standard error bars (in inches).

parallel

logical. Should parallel foreach be used? Must register parallel beforehand, such as doMC or others.

integer. Calculating interaction test statistics is a computationally intensive, so calculations are split up in several parts to prevent memory allocation errors. If a memory allocation error still occurs, increase k.

verbose

logical. Should progress information be printed to the command line?

...

Additional arguments to be passed to barplot.

Value

Function interact() returns and plots interaction statistics for the specified predictor variables. If nullmods is not specified, it returns and plots only the interaction test statistics for the specified fitted prediction rule ensemble. If nullmods is specified, the function returns a list, with elements $fittedH2, containing the interaction statistics of the fitted ensemble, and $nullH2, which contains the interaction test statistics for each of the bootstrapped null interaction models.

If plot = TRUE (the default), a barplot is created with the interaction test statistic from the fitted prediction rule ensemble. If nullmods is specified, bars representing the median of the distribution of interaction test statistics of the bootstrapped null interaction models are plotted. In addition, error bars representing the quantiles of the distribution (their value specified by the quantprobs argument) are plotted. These allow for testing the null hypothesis of no interaction effect for each of the input variables.

Note that the error rates of null hypothesis tests of interaction effects have not yet been studied in detail, but likely depend on the number of generated bootstrapped null interaction models as well as the complexity of the fitted ensembles. Users are therefore advised to test for the presence of interaction effects by setting the nsamp argument of the function bsnullinteract $\geq 100$ (even though this may take a lot of computation time). Also, users are advised to test for the presence of interactions only with fitted ensembles that are neither too sparse nor too complex, that is, ensembles that are selected by setting the penalty.par.val argument equal to "lambda.min" or "lambda.1se".

Details

Can be computationally intensive, especially when nullmods is specified, in which case setting parallel = TRUE may improve speed.

Examples

Run this code

# NOT RUN {
 set.seed(42)
 airq.ens <- pre(Ozone ~ ., data=airquality[complete.cases(airquality),])
 interact(airq.ens, c("Temp", "Wind", "Solar.R"))
# }