Ancillary simulated annealing functions
Built-in functions related to simulated annealing
safs_initial(vars, prob = 0.2, ...) safs_perturb(x, vars, number = floor(vars*.01) + 1) safs_prob(old, new, iteration = 1)caretSA rfSA treebagSA
- the total number of possible predictor variables
- The probability that an individual predictor is included in the initial predictor set
- the integer index vector for the current subset
- old, new
- fitness values associated with the current and new subset
- the number of iterations overall or the number of iterations since restart (if
improveis used in
- the number of predictor variables to perturb
- not currently used
initial function is used to create the first predictor subset. The function
safs_initial randomly selects 20% of the predictors. Note that, instead of a function,
safs can also accept a vector of column numbers as the initial subset.
safs_perturb is an example of the operation that changes the subset configuration at the start of each new iteration. By default, it will change roughly 1% of the variables in the current subset.
prob function defines the acceptance probability at each iteration, given the old and new fitness (i.e. energy values). It assumes that smaller values are better. The default probability function computed the percentage difference between the current and new fitness value and using an exponential function to compute a probability:
prob = exp[(current-new)/current*iteration]
The return value depends on the function. Note that the SA code encodes the subsets as a vector of integers that are included in the subset (which is different than the encoding used for GAs).The objects
treebagSAare example lists that can be used with the
safsControl.In the case of
safspasses through to the model fitting routine. As a consequence, the
trainfunction can easily be accessed by passing important arguments belonging to
safs. See the examples below. By default, using
caretSAwill used the resampled performance estimates produced by
trainas the internal estimate of fitness.For
baggingfunctions are used directly (i.e.
trainis not used). Arguments to either of these functions can also be passed to them though the
safscall (see examples below). For these two functions, the internal fitness is estimated using the out-of-bag estimates naturally produced by those functions. While faster, this limits the user to accuracy or Kappa (for classification) and RMSE and R-squared (for regression).
selected_vars <- safs_initial(vars = 10 , prob = 0.2) selected_vars ### safs_perturb(selected_vars, vars = 10, number = 1) ### safs_prob(old = .8, new = .9, iteration = 1) safs_prob(old = .5, new = .6, iteration = 1) grid <- expand.grid(old = c(4, 3.5), new = c(4.5, 4, 3.5) + 1, iter = 1:40) grid <- subset(grid, old < new) grid$prob <- apply(grid, 1, function(x) safs_prob(new = x["new"], old= x["old"], iteration = x["iter"])) grid$Difference <- factor(grid$new - grid$old) grid$Group <- factor(paste("Current Value", grid$old)) ggplot(grid, aes(x = iter, y = prob, color = Difference)) + geom_line() + facet_wrap(~Group) + theme_bw() + ylab("Probability") + xlab("Iteration") ## Not run: # ### # ## Hypothetical examples # lda_sa <- safs(x = predictors, # y = classes, # safsControl = safsControl(functions = caretSA), # ## now pass arguments to `train` # method = "lda", # metric = "Accuracy" # trControl = trainControl(method = "cv", classProbs = TRUE)) # # rf_sa <- safs(x = predictors, # y = classes, # safsControl = safsControl(functions = rfSA), # ## these are arguments to `randomForest` # ntree = 1000, # importance = TRUE) # ## End(Not run)