The main function for the mogavs genetic algorithm, returning a list containing the full archive set of regression models tried and the nondominated set.
# S3 method for default
mogavs(x, y, maxGenerations = 10*ncol(x), popSize = ncol(x), noOfOffspring = ncol(x),
crossoverProbability = 0.9, mutationProbability = 1/ncol(x), kBest = 1,
plots = F, additionalPlots = F, ...)
# S3 method for formula
mogavs(formula, data, maxGenerations= 10*ncol(x), popSize = ncol(x),
noOfOffspring = ncol(x), crossoverProbability = 0.9, mutationProbability = 1/ncol(x),
kBest = 1, plots = F, additionalPlots = F, ...)
Formula interface with y~x1+x2 or y~. for predicting y with x1 and x2 or all predictors, respectively.
A data frame containing the variables mentioned in the formula.
An n x p matrix containing the n observations of p values used in the regression.
An n x 1 vector of values to fit the regression to.
Number of maximum generations to be run in the evolutionary algorithm. Default is 10*ncol(x)
Population size, ie. how many regression models the population holds. Default is ncol(x).
Indicates how many offspring models are generated for each generation. Default is ncol(x).
Indicates the probability of crossover for each offpring. Default is 0.9.
Indicates the probability of mutation for each offspring. Default is 1/ncol(x).
Indicates how many best models for each number of variables are highlighted in printing at the end of the run (default=1).
Binary variable for turning plotting for each generation on/off.
Binary variable for turning additional plotting at the end of the run on/off. Plot can also be generated after the run with given createAdditionalPlots
functions.
Any additional arguments.
Returns model of class mogavs
with items
Matrix of the nondominated models.
Vector of the number of variables for each model in the nonDominatedSet.
Vector of mean square errors for each model in the nonDominatedSet.
The full archive set of models tried
The value of kBest used
Number of generations used.
The crossover probability used.
Number of generated offspring for each generation.
The population size.
Runs genetic algorithm for the linear regression model space, with predicting variables x and predicted values y. Alternatively, can be given a data frame and formula. Setting plots=TRUE
creates for each generation a plot, showing the current efficient boundary of the models. Setting additionalPlots=TRUE
gives out an additional plot at the end of the algorithm, showing the full set of tried models and the kBest
best models for each number of variables. All plotting is turned off by default to make processing faster.
Sinha, A., Malo, P. & Kuosmanen, T. (2015) A Multi-objective Exploratory Procedure for Regression Model Selection. Journal of Computational and Grahical Statistics, 24(1). pp. 154-182.
# NOT RUN {
data(sampleData)
#just a few generations to keep test fast
mogavs(y~.,data=sampleData,maxGenerations=5)
#with a more sensible number of generations, with all plotting on
# }
# NOT RUN {
mogavs(y~.,data=sampleData,maxGenerations=100,plots=TRUE,additionalPlots=TRUE)
# }
Run the code above in your browser using DataLab