hypervolume_resample: Hypervolume resampling methods

Description

hypervolume_resample generates new hyperolumes based on the method input. Outputs written to file.

- "bootstrap": Generates n hypervolumes using data bootstrapped from original data

- "bootstrap seq": Generates n hypervolumes for each sample size in sequence specified by user

- "biased bootstrap": Bootstraps input hypervolume with biases applied through multivariate normal weights or user specified weights

Usage

hypervolume_resample(name, 
                      hv, 
                      method, 
                      n = 10, 
                      points_per_resample = "sample_size", 
                      seq = 3:nrow(hv@Data), 
                      k = 5,
                      cores = 1,
                      verbose = TRUE, 
                      mu = NULL, 
                      sigma = NULL, 
                      cols_to_bias = 1:ncol(hv@Data), 
                      weight_func = NULL)

Value

returns a string containing an absolute path equivalent to ./Objects/<name>

Arguments

name: File name; The function writes hypervolumes to file in ./Objects/<name>
hv: A hypervolume object
method: String input; options are "bootstrap", "bootstrap seq", and "biased bootstrap".
n: Number of resamples to take. Used for every method.
points_per_resample: Number of points in each resample. If the input is "sample_size", then the same number of points as the original sample is used. Used for method = "bootstrap" and method = "biased bootstrap".
seq: Sequence of sample sizes. If method = "bootstrap seq" then the function generates n bootstrapped hypervolumes for each sample size in seq. Used for method = "bootstrap seq".
k: Number of splits. Used only for method = "k_split".
cores: Number of logical cores to use while generating bootstraped hypervolumes. If parallel backend already registered to doParallel, function will use that backend and ignore the argument in cores.
verbose: Logical value; If function is being run sequentially, outputs progress bar in console.
mu: Array of values specifying the mean of multivariate normal weights. Used for method = "biased bootstrap".
sigma: Array of values specifying the variance in each dimension. (Lower variance corresponds to stronger bias) Used for method = "biased bootstrap".
cols_to_bias: Array of column indices; must be same length as mu and sigma. Used for method = "biased bootstrap".
weight_func: Custom weight function that takes in a matrix of values and returns desired weights for each row Used for method = "biased bootstrap".

Details

hypervolume_resample creates a directory called Objects in the current working directory if a directory of that name doesn't already exist. Returns an absolute path to directory with resampled hypervolumes. rds files are stored in different file structures depending on which method is called.

Use to_hv_list to extract every hypervolume object in a directory into a HypervolumeList object. It is also possible to access the hypervolumes by using readRDS to read the hypervolume objects in one by one.

The resampled hypervolumes are generated using the same parameters used to generate the input hypervolume. The only exception is that the bandwidth is re-estimated if method = "gaussian" or method = "box". See copy_param_hypervolume for more details.

Examples

Run this code

if (FALSE) {
library(palmerpenguins)
data(penguins)
bill_data = na.omit(penguins[,3:4])
hv = hypervolume(bill_data)

# Example 1: Get 50 resampled hypervolumes
# Use detectCores to see how many cores are availible in current environment
# Set cores = 1 to run sequentially (default)
path = hypervolume_resample("example_bootstrap", 
                              hv, 
                              method = "bootstrap", 
                              n = 50, 
                              cores = 12)
hvs = to_hv_list(path)

# Example 2: Get resample with applied bias
# Get maximum bill length
max_bill = max(bill_data$bill_length_mm)
# Make data with larger bill length slightly more likley to be resampled
biased_path = hypervolume_resample("biased test", 
                                    hv, 
                                    method = "biased bootstrap", 
                                    n = 50, 
                                    cores = 12, 
                                    mu = max_bill, 
                                    sigma = 90, 
                                    cols_to_bias = "bill_length_mm")
hvs_biased = to_hv_list(biased_path)
}