Learn R Programming

corehunter (version 3.0.1)

sampleCore: Sample a core collection from the given data.

Description

Sample a core collection from the given data.

Usage

sampleCore(data, obj, size = 0.2, mode = c("default", "fast"), normalize = TRUE, time = NA, impr.time = NA, indices = FALSE, verbose = FALSE)

Arguments

data
Core Hunter data (chdata) containing genotypes, phenotypes and/or a precomputed distance matrix. Typically the data is obtained with coreHunterData. Can also be an object of class chdist, chgeno or chpheno if only one type of data is provided.
obj
Objective or list of objectives (chobj). If no objectives are specified Core Hunter maximizes a weighted index including the default entry-to-nearest-entry distance (EN) for each available data type, with equal weight. For genotyes, the Modified Roger's distance (MR) is used. For phenotypes, Gower's distance (GD) is applied.
size
Desired core subset size (numeric). If larger than one the value is used as the absolute core size after rounding. Else it is used as the sampling rate and multiplied with the dataset size to determine the size of the core. The default sampling rate is 0.2.
mode
Execution mode (default or fast). In default mode, Core Hunter uses an advanced parallel tempering search algorithm and terminates when no improvement is found for ten seconds. In fast mode, a simple stochastic hill-climbing algorithm is applied and Core Hunter terminates as soon as no improvement is made for two seconds. Stop conditions can be overriden with arguments time and impr.time.
normalize
If TRUE (default) the applied objectives in a multi-objective configuration (two or more objectives) are automatically normalized prior to execution. Normalization requires an independent preliminary search per objective (simple stochastic hill-climber, executed in parallel). If a time limit or maximum time without finding an improvement (impr.time) have been set, the same limits are applied to each normalization search as well as the main multi-objective search. In case of a limited number of objectives the total execution time should usually not exceed twice the imposed search time limit, if any. Normalization ranges can also be precomputed (see getNormalizationRanges) or manually specified in the objectives to save computation time when sampling core collections. This is especially useful when multiple cores are sampled for the same objectives, with possibly varying weights.
time
Absolute runtime limit in seconds. Not used by default. If used it should be a strictly positive value and is rounded to the nearest integer.
impr.time
Maximum time without improvement in seconds. When set to NA a default value is set depending on the execution mode. If set to another value it should be strictly positive and is rounded to the nearest integer.
indices
If TRUE the result contains the indices instead of unique identifiers (default) of the selected individuals.
verbose
If TRUE search progress messages are printed to the console. Defaults to FALSE.

Value

Core subset (chcore). It has an element sel which is a character or numeric vector containing the ids or indices, respectively, of the selected individuals (see argument indices). In addition the result has one or more elements that indicate the value of each objective function that was included in the optimization.

See Also

coreHunterData, objective, getNormalizationRanges

Examples

Run this code

data <- exampleData()

# default size, maximize entry-to-nearest-entry Modified Rogers distance
obj <- objective("EN", "MR")
sampleCore(data, obj)

# fast mode
sampleCore(data, obj, mode = "f")
# absolute size
sampleCore(data, obj, size = 25)
# relative size
sampleCore(data, obj, size = 0.1)

# other objective: minimize accession-to-nearest-entry precomputed distance
sampleCore(data, obj = objective(type = "AN", measure = "PD"))
# multiple objectives (equal weight)
sampleCore(data, obj = list(
 objective("EN", "PD"),
 objective("AN", "GD")
))
# multiple objectives (custom weight)
sampleCore(data, obj = list(
 objective("EN", "PD", weight = 0.3),
 objective("AN", "GD", weight = 0.7)
))

# custom stop conditions
sampleCore(data, obj, time = 5, impr.time = 2)

# print progress messages
sampleCore(data, obj, verbose = TRUE)


Run the code above in your browser using DataLab