sampleCore: Sample a core collection.

Description

Sample a core collection from the given data.

Usage

sampleCore(
  data,
  obj,
  size = 0.2,
  always.selected = integer(0),
  never.selected = integer(0),
  mode = c("default", "fast"),
  normalize = TRUE,
  time = NA,
  impr.time = NA,
  steps = NA,
  impr.steps = NA,
  indices = FALSE,
  verbose = FALSE
)

Value

Core subset (chcore). It has an element sel

which is a character or numeric vector containing the sorted ids or indices, respectively, of the selected individuals (see argument indices). In addition the result has one or more elements that indicate the value of each objective function that was included in the optimization.

Arguments

data

Core Hunter data (chdata) containing genotypes, phenotypes and/or a precomputed distance matrix. Typically the data is obtained with coreHunterData. Can also be an object of class chdist, chgeno or chpheno if only one type of data is provided.

obj

Objective or list of objectives (chobj). If no objectives are specified Core Hunter maximizes a weighted index including the default entry-to-nearest-entry distance (EN) for each available data type, with equal weight. For genotypes, the Modified Roger's distance (MR) is used. For phenotypes, Gower's distance (GD) is applied.

size

Desired core subset size (numeric). If larger than one the value is used as the absolute core size after rounding. Else it is used as the sampling rate and multiplied with the dataset size to determine the size of the core. The default sampling rate is 0.2.

always.selected

vector with indices (integer) or ids (character) of items that should always be selected in the core collection

never.selected

vector with indices (integer) or ids (character) of items that should never be selected in the core collection

mode

Execution mode (default or fast). In default mode, Core Hunter uses an advanced parallel tempering search algorithm and terminates when no improvement is found for ten seconds. In fast mode, a simple stochastic hill-climbing algorithm is applied and Core Hunter terminates as soon as no improvement is made for two seconds. Stop conditions can be overridden with arguments time and impr.time.

normalize

If TRUE (default), the applied objectives in a multi-objective configuration (two or more objectives) are automatically normalized prior to execution. For single-objective configurations, this argument is ignored.

Normalization requires an independent preliminary search per objective (fast stochastic hill-climber, executed in parallel for all objectives). The same stop conditions, as specified for the main search, are also applied to each normalization search. In default execution mode, however, any step-based stop conditions are multiplied by 500 for the normalization searches, because in that case the main search (parallel tempering) executes 500 stochastic hill-climbing steps per replica, in a single step of the main search.

Normalization ranges can also be precomputed (see getNormalizationRanges) or manually specified in the objectives to save computation time when sampling core collections. This is especially useful when multiple cores are sampled for the same objectives, with possibly varying weights.

time

Absolute runtime limit in seconds. Not used by default (NA). If used, it should be a strictly positive value, which is rounded to the nearest integer.

impr.time

Maximum time without improvement in seconds. If no explicit stop conditions are specified, the maximum time without improvement defaults to ten or two seconds, when executing Core Hunter in default or fast mode, respectively. If a custom improvement time is specified, it should be strictly positive and is rounded to the nearest integer.

steps

Maximum number of search steps. Not used by default (NA). If used, it should be a strictly positive value, which is rounded to the nearest integer. The number of steps applies to the main search. Details of how this stop condition is transferred to normalization searches, in a multi-objective configuration, are provided in the description of the argument normalize.

impr.steps

Maximum number of steps without improvement. Not used by default (NA). If used, it should be a strictly positive value, which is rounded to the nearest integer. The maximum number of steps without improvement applies to the main search. Details of how this stop condition is transferred to normalization searches, in a multi-objective configuration, are provided in the description of the argument normalize.

indices

If TRUE, the result contains the indices instead of ids (default) of the selected individuals.

verbose

If TRUE, search progress messages are printed to the console. Defaults to FALSE.

Details

Because Core Hunter uses stochastic algorithms, repeated runs may produce different results. To eliminate randomness, you may set a random number generation seed using set.seed prior to executing Core Hunter. In addition, when reproducible results are desired, it is advised to use step-based stop conditions instead of the (default) time-based criteria, because runtimes may be affected by external factors, and, therefore, a different number of steps may have been performed in repeated runs when using time-based stop conditions.

Examples

Run this code

# \donttest{
data <- exampleData()

# default size, maximize entry-to-nearest-entry Modified Rogers distance
obj <- objective("EN", "MR")
core <- sampleCore(data, obj)

# fast mode
core <- sampleCore(data, obj, mode = "f")
# absolute size
core <- sampleCore(data, obj, size = 25)
# relative size
core <- sampleCore(data, obj, size = 0.1)

# other objective: minimize accession-to-nearest-entry precomputed distance
core <- sampleCore(data, obj = objective(type = "AN", measure = "PD"))
# multiple objectives (equal weight)
core <- sampleCore(data, obj = list(
 objective("EN", "PD"),
 objective("AN", "GD")
))
# multiple objectives (custom weight)
core <- sampleCore(data, obj = list(
 objective("EN", "PD", weight = 0.3),
 objective("AN", "GD", weight = 0.7)
))

# custom stop conditions
core <- sampleCore(data, obj, time = 5, impr.time = 2)
core <- sampleCore(data, obj, steps = 300)

# print progress messages
core <- sampleCore(data, obj, verbose = TRUE)
# }