calculateFeatureSet: Calculate Landscape Features

Description

Performs an Exploratory Landscape Analysis of a continuous function and computes various features, which quantify the function's landscape. Currently, the following feature sets are provided:

: cell mapping features ("cm_angle", "cm_conv", "cm_grad") ELA: classical ELA features ("ela_conv", "ela_curv", "ela_distr", "ela_level", "ela_local", "ela_meta") GCM: general cell mapping features ("gcm") BT: barrier tree features ("bt") IC: information content features ("ic") Basic: basic features ("basic") Disp: dispersion features ("disp") LiMo: linear model features ("limo") NBC: nearest better clustering features ("nbc") PC: principal component features ("pca")

Usage

calculateFeatureSet(feat.object, set, control, ...)
calculateFeatures(feat.object, control, ...)

Arguments

feat.object

[FeatureObject] A feature object as created by createFeatureObject.

set

[character(1)] Name of the feature set, which should be computed. All possible feature sets can be listed using listAvailableFeatureSets.

control

[list] A list, which stores additional control arguments. For further information, see details.

...

[any] Further arguments, e.g. handled by optim (within the computation of the ELA local search features) or density (within the computation of the ELA y-distri

Value

list of (numeric) features:
- cm_angle-- angle features (10)
: These features are based on the location of the worst and best element within each cell. To be precise, their distance to the cell center and the angle between these three elements (at the center) are the foundation:
- dist_ctr2{best, worst}.{mean, sd}
: arithmetic mean and standard deviation of distances from the cell center to the best / worst observation within the cell (over all cells)
angle.{mean, sd}
: arithmetic mean and standard deviation of angles (in degree) between worst, center and best element of a cell (over all cells)
y_ratio_best2worst.{mean, sd}
: arithmetic mean and standard deviation of the ratios between the distance of the worst and best element within a cell and the worst and best element in the entire initial design (over all cells); note that the distances are only measured in the objective space
costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features

item

cm_conv -- cell mapping convexity features (6)
cm_grad -- gradient homogeneity features (4)
ela_conv -- ELA convexity features (6)
ela_curv -- ELA curvature features (23)
ela_distr -- ELA y-distribution features (5)
ela_level -- ELA levelset features (20)
ela_local -- ELA local search features (15)
ela_meta -- ELA meta model features (11)
gcm -- general cell mapping (GCM) features (75)
bt -- barrier tree features (90)
ic -- information content features (7)
basic -- basic features (16)
disp -- dispersion features (18)
limo -- linear model features (14)
nbc -- nearest better (clustering) features (7)
pca -- principal component (analysis) features (10)
cell mapping angle features
cell mapping convexity features
cm_conv.dist_method
cm_conv.minkowski_p
cm_conv.fast_k
cell mapping gradient homogeneity features
cm_grad.dist_method
cm_grad.minkowski_p
cm_grad.show_warnings
ELA convexity features
ela_conv.threshold
ELA curvature features
ELA distribution features
ela_distr.modemass_threshold
ela_distr.skewness_type
ela_distr.kurtosis_type
ELA levelset features
ela_level.classif_methods
ela_level.resample_method
ela_level.resample_iterations
ela_level.resample_info
ela_level.parallelize
ela_level.parallel.mode
ela_level.parallel.cpus
ela_level.parallel.level
ela_level.parallel.logging
ela_level.parallel.show_info
ELA local search features
ela_local.optim_method
ela_local.optim_method_control
ela_local.sample_seed
ela_local.clust_method
ela_local.clust_cut_function
GCM features
gcm.cf_power
barrier tree features
gcm.cf_power
bt.base
bt.max_depth
information content features
ic.sorting
ic.sample.generate
ic.sample.dimensions
ic.sample.size
ic.sample.lower
ic.sample.upper
ic.aggregate_duplicated
ic.show_warnings
ic.seed
ic.nn.start
ic.nn.neighborhood
ic.settling_sensitivity
ic.info_sensitivity
dispersion features
disp.dist_method
disp.minkowski_p
nearest better clustering features
nbc.minkowski_p
nbc.dist_tie_breaker
nbc.cor_na
nbc.fast_k
principal component features

cr

Each cell will be represented by an observation (of the initial design), which is located closest to the cell center. Then, the objectives of three neighbouring cells are compared:
- {convex, concave}.hard
: if the objective of the inner cell is above / below the two outer cells, there is strong evidence for convexity / concavity {convex, concave}.soft: if the objective of the inner cell is above / below the arithmetic mean of the two outer cells, there is weak evidence for convexity / concavity costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Within a cell of the initial grid, the gradients between each observation and its nearest neighbour observation are computed. Those gradients are then directed towards the smaller of the two objective values and afterwards normalized. Then, the length of the sum of all the directed and normalized gradients within a cell is computed. Based on those measurements (one per cell) the following features are computed:
- {mean, sd}
: arithmetic mean and standard deviation of the aforementioned lengths costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Two observations are chosen randomly from the initial design. Then, a linear (convex) combination of those observations is calculated -- based on a random weight from [0, 1]. The corresponding objective value will be compared to the linear combination of the objectives from the two original observations. This process is replicated convex.nsample (per default 1000) times and will then be aggregated:
- {convex_p, linear_p}
: percentage of convexity / linearity linear_dev.{orig, abs}: average (original / absolute) deviation between the linear combination of the objectives and the objective of the linear combination of the observations costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Given a feature object, curv.sample_size samples (per default 100 * d with d being the number of features) are randomly chosen. Then, the gradient and hessian of the function are estimated based on those points and the following features are computed:
- grad_norm.{min, lq, mean, median, uq, max, sd}
: aggregations (minimum, lower quartile, arithmetic mean, median, upper quartile, maximum and standard deviation) of the gradients' lengths grad_scale.{min, lq, mean, median, uq, max, sd}: aggregations of the ratios between biggest and smallest (absolute) gradient directions hessian_cond.{min, lq, mean, median, uq, max, sd}: aggregations of the ratios of biggest and smallest eigenvalue of the hessian matrices costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
- skewness
: skewness of the objective values kurtosis: kurtosis of the objective values number_of_peaks: number of peaks based on an estimation of the density of the objective values costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
- mmce_{methods}_{quantiles}
: mean misclassification error of each pair of classification method and quantile {method1}_{method2}_{quantiles}: ratio of all pairs of classification methods for all quantiles costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Based on some randomly chosen points from the initial design, a pre-defined number of local searches (ela_local.local_searches) are executed. Their optima are then clustered (using hierarchical clustering), assuming that local optima that are located close to each other, likely belong to the same basin. Given those basins, the following features are computed:
- n_loc_opt.{abs, rel}
: the absolute / relative amount of local optima best2mean_contr.orig: each cluster is represented by its center; this feature is the ratio of the objective values of the best and average cluster best2mean_contr.ratio: each cluster is represented by its center; this feature is the ratio of the differences in the objective values of average to best and worst to best cluster basin_sizes.avg_{best, non_best, worst}: average basin size of the best / non-best / worst cluster(s) fun_evals.{min, lq, mean, median, uq, max, sd}: aggregations of the performed local searches costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Given an initial design, linear and quadratic models of the form objective ~ features are created. Both versions are created with and without simple interactions (e.g., x1:x2). Based on those models, the following features are computed:
- lin_simple.{adj_r2, intercept}
: adjusted R^2 (i.e. model fit) and intercept of a simple linear model lin_simple.coef.{min, max, max_by_min}: smallest and biggest (non-intercept) absolute coefficients of the simple linear model, and their ratio {lin_w_interact, quad_simple, quad_w_interact}.adj_r2: adjusted R^2 (i.e. the model fit) of a linear model with interactions, and a quadratic model with and without interactions quad_simple.cond: condition of a simple quadratic model (without interactions), i.e. the ratio of its (absolute) biggest and smallest coefficients costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Computes general cell mapping features based on the Generalized Cell Mapping (GCM) approach, which interpretes the cells as absorbing Markov chains. Computations are performed based on three different approaches: taking the best (min) or average (mean) objective value of a cell or the closest observation (near) to a cell as representative. For each of these approaches the following 25 features are computed:
- attractors, pcells, tcells, uncertain
: relative amount of attractor, periodic, transient and uncertain cells basin_prob.{min, mean, median, max, sd}: aggregations of the probabilities of each basin of attraction basin_certain.{min, mean, median, max, sd}: aggregations of the (relative) size of each basin of attraction, in case only certain cells are considered (i.e. cells, which only point towards one attractor) basin_uncertain.{min, mean, median, max, sd}: aggregations of the (relative) size of each basin of attraction, in case uncertain cells are considered (i.e. a cell, which points to multiple attractors contributes to each of its basins) best_attr.{prob, no}: probability of finding the attractor with the best objective value and the (relative) amount of those attractors (i.e. the ratio of the number of attractors with the best objective value and the total amount of cells) costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Computes barrier tree features on 2D (!) problems, based on a Generalized Cell Mapping (GCM) approach. Computations are performed based on three different approaches: taking the best (min) or average (mean) objective value of a cell or the closest observation (near) to a cell as representative. For each of these approaches the following 31 features are computed:
- levels
: absolute number of levels of the barrier tree leaves: absolute number of leaves (i.e. local optima) of the barrier tree depth: range between highest and lowest node of the tree depth_levels_ratio: ratio of depth and levels levels_nodes_ratio: ratio of number of levels and number of (non-root) nodes of the tree diffs.{min, mean, median, max, sd}: aggregations of the height differences between a node and its predecessor level_diffs.{min, mean, median, max, sd}: aggregations of the average height differences per level attractor_dists.{min, mean, median, max, sd}: aggregations of the (euclidean) distances between the local and global best cells (attractors) basin_ratio.{uncertain, certain, most_likely}: ratios of maximum and minimum size of the basins of attractions; here, a cell might belong to different attractors (uncertain), exactly one attractor (certain) or the attractor with the highest probability basin_intersection.{min, mean, median, max, sd}: aggregations of the intersection between the basin of the global best value and the basins of all local best values basin_range: range of a basin (euclidean distance of widest range per dimension) costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Computes features based on the Information Content of Fitness Sequences (ICoFiS) approach (cf. Munoz et al., 2015). In this approach, the information content of a continuous landscape, i.e. smoothness, ruggedness, or neutrality, are quantified. While common analysis methods were able to calculate the information content of discrete landscapes, the ICoFiS approach provides an adaptation to continuous landscapes that accounts e.g. for variable step sizes in random walk sampling:
- h.max
: maximum information content (entropy) of the fitness sequence, cf. equation (5) eps.s: settling sensitivity, indicating the epsilon for which the sequence nearly consists of zeros only, cf. equation (6) eps.max: similar to eps.s, but in contrast to the former eps.max guarantees non-missing values eps.ratio: ratio of partial information sensitivity, cf. equation (8), where the ratio is 0.5 m0: initial partial information, cf. equation (7) costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Very simple features, which can be read from the feature object (without any computational efforts):
- {dim, observations}
: number of features / dimensions and observations within the initial sample {lower, upper, objective, blocks}_{min, max}: minimum and maximum value of all lower and upper bounds, the objective values and the number of blocks / cells (per dimension) cells_{filled, total}: number of filled (i.e. non-empty) cells and total number of cells {allows_cm, minimize_fun}: logical values, indicating whether this FeatureObject allows cell mapping and whether the optimization function should be minimized costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Computes features based on the comparison of the dispersion of pairwise distances among the 'best' elements and the entire initial design:
- {ratio, diff}_{mean, median}_{02, 05, 10, 25}
: ratio and difference of the mean / median distances of the distances of the 'best' objectives vs. 'all' objectives costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Linear models are computed per cell, provided the decision space is divided into a grid of cells. Each one of the models has the form objective ~ features.
- avg_length.{reg, norm}
: length of the average coefficient vector (based on regular and normalized vectors) length_{mean, sd}: arithmetic mean and standard deviation of the lengths of all coefficient vectors cor.{reg, norm}: correlation of all coefficient vectors (based on regular and normalized vectors) ratio_{mean, sd}: arithmetic mean and standard deviation of the ratios of (absolute) maximum and minimum (non-intercept) coefficients per cell sd_{ratio, mean}.{reg, norm}: max-by-min-ratio and arithmetic mean of the standard deviations of the (non-intercept) coefficients (based on regular and normalized vectors) costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
Computes features based on the comparison of nearest neighbour and nearest better neighbour, i.e., the nearest neighbor with a better performance / objective value value.
- nn_nb.{sd, mean}_ratio
: ratio of standard deviations and arithmetic mean based on the distances among the nearest neighbours and the nearest better neighbours nn_nb.cor: correlation between distances of the nearest neighbours and the distances of the nearest better neighbours dist_ratio.coeff_var: coefficient of variation of the distance ratios nb_fitness.cor: correlation between fitness value and count of observations to whom the current observation is the nearest better neighbour (the so-called indegree). costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
- expl_var.{cov, cor}_{x, init}
: proportion of the explained variance when applying PCA to the covariance / correlation matrix of the decision space (x) or the entire initial design (init) expl_var_PC1.{cov, cor}_{x, init}: proportion of variance, which is explained by the first principal component -- when applying PCA to the covariance / correlation matrix of the decision space (x) or the entire initial design costs_{fun_evals, runtime}: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features

itemize

pca.{cov, cor}_{x, init}

code

0.9

emph

single linkage clustering

dQuote

settling sensitivity

Details

Note that if you want to speed up the runtime of the features, you might consider running your feature computation parallelized. For more information, please refer to the parallelMap package or to http://mlr-org.github.io/mlr-tutorial/release/html/parallelization/index.html.

Furthermore, please consider adapting the feature computation to your needs. Possible control arguments are:

general

show_progress

: Show progress bar when computing the features? The default is TRUE. subset: Specify a subset of features that should be computed. Per default, all features will be computed. allow_cellmapping: Should cell mapping features be computed? The default is TRUE. allow_costs: Should expensive features, i.e. features, which require additional function evaluations, be computed? The default is TRUE if the feature object provides a function, otherwise FALSE. blacklist: Which features should NOT be computed? The default is NULL, i.e. none of the features will be excluded.

References

Kerschke, P., Preuss, M., Hernandez, C., Schuetze, O., Sun, J.-Q., Grimme, C., Rudolph, G., Bischl, B., and Trautmann, H. (2014)

: Cell Mapping Techniques for Exploratory Landscape Analysis, in: EVOLVE -- A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V, pp. 115-131 (http://dx.doi.org/10.1007/978-3-319-07494-8_9). Kerschke, P., Preuss, M., Wessing, S., and Trautmann, H. (2015): Detecting Funnel Structures by Means of Exploratory Landscape Analysis, in: Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation (GECCO '15), pp. 265-272 (http://dx.doi.org/10.1145/2739480.2754642). Lunacek, M., and Whitley, D. (2006): The dispersion metric and the CMA evolution strategy, in: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO '06), pp. 477-484 (http://dx.doi.org/10.1145/1143997.1144085). Mersmann, O., Bischl, B., Trautmann, H., Preuss, M., Weihs, C., and Rudolph, G. (2011): Exploratory Landscape Analysis, in: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO '11), pp. 829-836 (http://dx.doi.org/10.1145/2001576.2001690). Munoz, M. A., Kirley, M., and Halgamuge, S. K. (2015): Exploratory Landscape Analysis of Continuous Space Optimization Problems Using Information Content, in: IEEE Transactions on Evolutionary Computation (19:1), pp. 74-87 (http://dx.doi.org/10.1109/TEVC.2014.2302006).

Examples

Run this code

# (1) create a feature object:
X = t(replicate(n = 2000, expr = runif(n = 5, min = -10, max = 10)))
feat.object = createFeatureObject(X = X, fun = function(x) sum(x^2))

# (2) compute all non-cellmapping features
ctrl = list(allow_cellmapping = FALSE)
features = calculateFeatures(feat.object, control = ctrl)

# (3) in order to allow the computation of the cell mapping features, one
# has to provide a feature object that has knowledge about the number of
# cells per dimension:
f = function(x) sum(x^2)
feat.object = createFeatureObject(X = X, fun = f, blocks = 3)
features = calculateFeatures(feat.object)

# (4) if you want to compute a specific feature set, you can use
# calculateFeatureSet:
features.angle = calculateFeatureSet(feat.object, "cm_angle")

# (5) as noted in the details, it might be useful to compute the levelset
# features parallelized:
library(parallelMap)
library(parallel)
n.cores = detectCores()
parallelStart(mode = "multicore", cpus = n.cores,
  logging = FALSE, show.info = FALSE)
system.time((levelset.par = calculateFeatureSet(feat.object, "ela_level")))
parallelStop()
system.time((levelset.seq = calculateFeatureSet(feat.object, "ela_level")))

Run the code above in your browser using DataLab