mogavs: Multiobjective Genetic Algorithm for Variable Selection

Description

The main function for the mogavs genetic algorithm, returning a list containing the full archive set of regression models tried and the nondominated set.

Usage

# S3 method for default
mogavs(x, y, maxGenerations = 10*ncol(x), popSize = ncol(x), noOfOffspring = ncol(x),
crossoverProbability = 0.9, mutationProbability = 1/ncol(x), kBest = 1, 
plots = F, additionalPlots = F, ...)
# S3 method for formula
mogavs(formula, data, maxGenerations= 10*ncol(x), popSize = ncol(x), 
noOfOffspring = ncol(x), crossoverProbability = 0.9, mutationProbability = 1/ncol(x), 
kBest = 1, plots = F, additionalPlots = F, ...)

Arguments

formula

Formula interface with y~x1+x2 or y~. for predicting y with x1 and x2 or all predictors, respectively.

data

A data frame containing the variables mentioned in the formula.

An n x p matrix containing the n observations of p values used in the regression.

An n x 1 vector of values to fit the regression to.

maxGenerations

Number of maximum generations to be run in the evolutionary algorithm. Default is 10*ncol(x)

popSize

Population size, ie. how many regression models the population holds. Default is ncol(x).

noOfOffspring

Indicates how many offspring models are generated for each generation. Default is ncol(x).

crossoverProbability

Indicates the probability of crossover for each offpring. Default is 0.9.

mutationProbability

Indicates the probability of mutation for each offspring. Default is 1/ncol(x).

kBest

Indicates how many best models for each number of variables are highlighted in printing at the end of the run (default=1).

plots

Binary variable for turning plotting for each generation on/off.

additionalPlots

Binary variable for turning additional plotting at the end of the run on/off. Plot can also be generated after the run with given createAdditionalPlots functions.

…

Any additional arguments.

Value

Returns model of class mogavs with items

nonDominatedSet

Matrix of the nondominated models.

numOfVariables

Vector of the number of variables for each model in the nonDominatedSet.

MSE

Vector of mean square errors for each model in the nonDominatedSet.

archiveSet

The full archive set of models tried

kBest

The value of kBest used

maxGenerations

Number of generations used.

crossoverProbability

The crossover probability used.

noOfOffspring

Number of generated offspring for each generation.

popSize

The population size.

Details

Runs genetic algorithm for the linear regression model space, with predicting variables x and predicted values y. Alternatively, can be given a data frame and formula. Setting plots=TRUE creates for each generation a plot, showing the current efficient boundary of the models. Setting additionalPlots=TRUE gives out an additional plot at the end of the algorithm, showing the full set of tried models and the kBest best models for each number of variables. All plotting is turned off by default to make processing faster.

References

Sinha, A., Malo, P. & Kuosmanen, T. (2015) A Multi-objective Exploratory Procedure for Regression Model Selection. Journal of Computational and Grahical Statistics, 24(1). pp. 154-182.

Examples

Run this code

# NOT RUN {
data(sampleData)
#just a few generations to keep test fast
mogavs(y~.,data=sampleData,maxGenerations=5)

#with a more sensible number of generations, with all plotting on
# }
# NOT RUN {
mogavs(y~.,data=sampleData,maxGenerations=100,plots=TRUE,additionalPlots=TRUE)
# }

Run the code above in your browser using DataLab