Learn R Programming

sprinter (version 1.1.0)

screen.inter: Adaptive function for screening interactions

Description

fit.logicReg and fit.rf are functions for screening interactions in high-dimensional datasets for the usage in the argument screen.inter in the function sprinter. They return a variable importance measurement for each variable.

Usage

fit.rf(nr, data, indices, seed.interselect, ...)

fit.rf.select(nr, data, indices, seed.interselect, n.select, ...)

fit.logicReg(nr, data, indices, seed.interselect,
       type,
       nleaves,
       ntrees, ...)
fit.logicReg.select(nr, data, indices, seed.interselect,
       type,
       nleaves,
       ntrees, 
       n.select,...)

Arguments

nr
number of resample run.
data
data frame containing the y-outcome and x-variables in the model, which is orthogonalized to the clinical covariates and the main effects identified in the main effects detection step.
indices
indices to build the resample dataset.
seed.interselect
seed for random number generator.
n.select
Number of variables selected for performing random forest.
type
type of model to be fit. For survival data you can choose between (4) proportional hazards model (Cox regression), and (5) exponential survival model, or (0) your own scoring function.
nleaves
maximum number of leaves to be fit in all trees combined.
ntrees
number of logic trees to be fit.
...
further arguments passed to methods.

Value

  • fit.rf and fit.logicRegreturn a vector of length p, containing the variable importance of each variable in the data set. fit.rf evaluates the permutation accuracy importance (PAM) as a measure for the variable importance. The functionfit.logicRegreturns the information whether a variable is enclosed in the model (1) or not (0).

Details

The functions logicReg and fit.rfare adapted for the usage in the function sprinter in order to screen interactions. Therein, variable importance measurements are evaluated for each variable, which will be used for pre-selecting relevant interactions in the function sprinter. In the function sprinter the identified interaction candidates will be combined with each other pairwise and will be provided as possible predictors for the final model. fit.rf{ This function performs a random forest for survival. It judges each variable by the permutation accuracy importance. For more information about performing the random forest see rfsrc. } fit.rf.select{ This function performs a random forest for survival on a restricted data set. The number of covariables in this restricted data set can be set in n.select. The variables with the n.select smallest univariate p-values evaluated by Cox regression are selected. } fit.logicReg{ For the usage of the logic regression all continuous variables are converted to binary variables at the median. Then the logic regression is fitted onto the binary data set. The variable importance measure is one, if the variable is included in the model and zero if not. In order to get the information about the variables in a multiple model, the set select = 2 is obligatory. } fit.logicReg.select{ This function performs logic regression on a restricted data set. The number of covariables in this restricted data set can be set in n.select. The variables with the n.select smallest univariate p-values evaluated by Cox regression are selected. } Implementing new functions for the argument screen.inter{ New functions for screening interactions can be constructed in a way that for each variable an importance measurement is returned as a vector of length p. The variable importance measurements larger than zero should be interpreted as relevant for the model. The following arguments must be enclosed in this function: ll{ nr value displaying the actual resampling run. data data frame containing the y-outcome and x-variables in the model. indices indices to build the resample dataset. seed.interselect seed for random number generator. } With this directive other functions can be implemented and used for screening potential interaction candidates. }

References

Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475-511. Breiman L. (2001). Random forests, Machine Learning, 45:5-32.

See Also

logreg, rfsrc