snpRF (version 0.4)

snpRFcv: Random Forest Cross-Valdidation for feature selection

Description

This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.

Usage

snpRFcv(trainx.autosome=NULL,trainx.xchrom=NULL,trainx.covar=NULL, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)

Arguments

trainx.autosome
A matrix of autosomal markers with each column corresponding to a SNP coded as count of a particular allele (i.e. 0,1 or 2), and each row corresponding to a sample/individual.
trainx.xchrom
A matrix of X chromosome markers, each marker coded as two adjacent columns, alleles of a marker are coded as 0 or 1 for carrying a particular allele. Although males only have one X-chromosome, their markers are coded as 2 columns as well, the second column being a duplicate of the first. Each row of this matrix corresponds to a sample/individual. This data must be phased in chromosomal order.
trainx.covar
A matrix of covariates, each column being a different covariate, and each row, a sample/individual.
trainy
vector of response, must be a factor and have length equal to the number of rows in trainx.*
cv.fold
number of folds in the cross-validation
scale
if "log", reduce a fixed proportion (step) of variables at each step, otherwise reduce step variables at a time
step
if log=TRUE, the fraction of variables to remove at each step, else remove this many variables at a time
mtry
a function of number of remaining predictor variables to use as the mtry parameter in the snpRF call
recursive
whether variable importance is (re-)assessed at each step of variable reduction
...
other arguments passed on to snpRF

Value

A list with the following components:list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)
n.var
vector of number of variables used at each step
error.cv
corresponding vector of error rates or MSEs at each step
predicted
list of n.var components, each containing the predicted values from the cross-validation

References

Svetnik, V., Liaw, A., Tong, C. and Wang, T., ``Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules'', MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.

See Also

snpRF, importance

Examples

Run this code
set.seed(647)
data(snpRFexample)
result <- snpRFcv(trainx.autosome=autosome.snps,trainx.xchrom=xchrom.snps,
                  trainx.covar=covariates, trainy=phenotype)
with(result, plot(n.var, error.cv, log="x", type="o", lwd=2))

## The following can take a while to run, so if you really want to try
## it, copy and paste the code into R.

## Not run: 
# result <- replicate(5,snpRFcv(trainx.autosome=autosome.snps,
#                               trainx.xchrom=xchrom.snps,
#                               trainx.covar=covariates, trainy=phenotype), 
# 		    simplify=FALSE)
# error.cv <- sapply(result, "[[", "error.cv")
# matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l",
#         lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x",
#         xlab="Number of variables", ylab="CV Error")
# ## End(Not run)

Run the code above in your browser using DataLab