Learn R Programming

FWDselect (version 2.1.0)

test: Bootstrap based test for covariate selection

Description

Function that applies a bootstrap based test for covariate selection. It helps to determine the number of variables to be included in the model.

Usage

test(x, y, method = "lm", family = "gaussian", nboot = 50,
  speedup = TRUE, qmin = NULL, unique = FALSE, q = NULL,
  bootseed = NULL, cluster = TRUE, ncores = NULL)

Arguments

x
A data frame containing all the covariates.
y
A vector with the response values.
method
A character string specifying which regression method is used, i.e., linear models ("lm"), generalized additive models.
family
A description of the error distribution and link function to be used in the model: ("gaussian"), ("binomial") or ("poisson").
nboot
Number of bootstrap repeats.
speedup
A logical value. If TRUE (default), the testing procedure is computationally efficient since it considers one more variable to fit the alternative model than the number of variables used to fit the null. If FALSE, the fit of th
qmin
By default NULL. If speedup is FALSE, qmin is an integer number selected by the user. To help you select this argument, it is recommended to visualize the graphical output of the plot functi
unique
A logical value. By default FALSE. If TRUE, the test is performed only for one null hypothesis, given by the argument q.
q
By default NULL. If unique is TRUE, q is the size of the subset of variables to be tested.
bootseed
Seed to be used in the bootstrap procedure.
cluster
A logical value. If TRUE (default), the testing procedure is parallelized.
ncores
An integer value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

Value

  • A list with two objects. The first one is a table containing
  • HypothesisNumber of the null hypothesis tested
  • StatisticValue of the T statistic
  • pvaluepvalue obtained in the testing procedure
  • DecisionResult of the test for a significance level of 0.05
  • The second argument nvar indicates the number of variables that have to be included in the model.

Details

In a regression framework, let $X_1, X_2, \ldots, X_p$, a set of $p$ initial variables and $Y$ the response variable, we propose a procedure to test the null hypothesis of $q$ significant variables in the model --$q$ effects not equal to zero-- versus the alternative in which the model contains more than $q$ variables. Based on the general model $$Y=m(\textbf{X})+\varepsilon \quad {\rm{where}} \quad m(\textbf{X})= m_{1}(X_{1})+m_{2}(X_{2})+\ldots+m_{p}(X_{p})$$ the following strategy is considered: for a subset of size $q$, considerations will be given to a test for the null hypothesis $$H_{0} (q): \sum_{j=1}^p I_{{m_j \ne 0}} \le q$$ vs. the general hypothesis $$H_{1} : \sum_{j=1}^p I_{{m_j \ne 0}} > q$$

References

Sestelo, M., Villanueva, N. M. and Roca-Pardinas, J. (2013). FWDselect: an R package for selecting variables in regression models. Discussion Papers in Statistics and Operation Research, University of Vigo, 13/01.

See Also

selection

Examples

Run this code
library(FWDselect)
data(diabetes)
x = diabetes[ ,2:11]
y = diabetes[ ,1]
test(x, y, method = "lm", cluster = FALSE, nboot = 5)

## for speedup = FALSE
# obj2 = qselection(x, y, qvector = c(1:9), method = "lm",
# cluster = FALSE)
# plot(obj2) # we choose q = 7 for the argument qmin
# test(x, y, method = "lm", cluster = FALSE, nboot = 5,
# speedup = FALSE, qmin = 7)

Run the code above in your browser using DataLab