Learn R Programming

STPGA (version 2.0)

GenAlgForSubsetSelectionNoTest: Genetic algorithm for subset selection no given test

Description

It uses a genetic algorithm to select $n_{Train}$ individuals so that optimality criterion is minimum.

Usage

GenAlgForSubsetSelectionNoTest(P, ntoselect, npop, nelite, mutprob, niterations, lambda, plotiters=TRUE, errorstat="PEVMEAN")

Arguments

P
$n \times k$ matrix of the first PCs of the predictor variables. The matrix needs identifiers of the individuals as rownames.
ntoselect
$n_{Train}:$ number of individuals to select in the training set.
npop
genetic algorithm parameter, number of solutions at each iteration
nelite
genetic algorithm parameter, number of solutions selected as elite parents which will generate the next set of solutions.
mutprob
genetic algorithm parameter, probability of mutation for each generated solution.
niterations
genetic algorithm parameter, number of iterations.
lambda
scalar shrinkage parameter ($\lambda>0$).
plotiters
plot the convergence: TRUE or FALSE. Default is TRUE.
errorstat
optimality criterion: One of the optimality criterion. Default is "PEVMEAN".

Value

A list of length nelite. The elements of the list are optimized training samples of size $n_{train}$ and they are listed in increasing order of the optimization criterion.

Examples

Run this code
## Not run: 
# data(iris)
# #We will try to estimate petal width from 
# #variables sepal length and width and petal length. 
# y<-iris[,4]
# X<-as.matrix(iris[,1:3])
# names(y)<-rownames(X)<-paste(iris[,5], rep(1:50,3),sep='_')
# 
# #test data 25 iris plants selected at random from the virginica family, 
# #NOTE: Increase niterations and npop substantially for better convergence.
# 
# ListTrain<-GenAlgForSubsetSelectionNoTest(P=X,ntoselect=25, 
# npop=100, nelite=5, mutprob=.8, niterations=20, plotiters=FALSE, lambda=1e-5)
# 
# ###test sample
# ytest<-y[!(names(y)%in%ListTrain[[1]])]
# Xtest<-X[!(rownames(X)%in%ListTrain[[1]]),]
# 
# 
# 
# ##predictions by optimized sample
# ytrainopt<-y[names(y)%in% ListTrain[[1]]]
# Xtrainopt<-X[rownames(X)%in%ListTrain[[1]],]
# 
# modelopt<-lm(ytrainopt~1+Xtrainopt)
# predictopt<-cbind(rep(1, nrow(Xtest)),Xtest)%*%modelopt$coefficients
# 
# ###predictions by a random sample of the same size
# rs<-sample(names(y), 25)
# ytrainrs<-y[names(y)%in%rs]
# Xtrainrs<-X[rownames(X)%in%rs,]
# modelrs<-lm(ytrainrs~1+Xtrainrs)
# ytestrs<-y[!(names(y)%in%rs)]
# Xtestrs<-X[!(rownames(X)%in%rs),]
# predictrs<-cbind(rep(1, nrow(Xtestrs)),Xtestrs)%*%modelrs$coefficients
# 
# #accuracies of the optimized sample and random sample. 
# #(expect optimized sample to have better accuracies than a random sample.)
# cor(predictopt,ytest)
# cor(predictrs, ytestrs)
# ## End(Not run)

Run the code above in your browser using DataLab