Learn R Programming

topolow (version 1.0.0)

initial_parameter_optimization: Run Parameter Optimization Via Latin Hypercube Sampling

Description

Performs parameter optimization using Latin Hypercube Sampling (LHS) combined with k-fold cross-validation. Parameters are sampled from specified ranges using maximin LHS design to ensure good coverage of parameter space. Each parameter set is evaluated using k-fold cross-validation to assess prediction accuracy. To calculate one NLL per set of parameters, the function uses a pooled errors approach which combine all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points

Usage

initial_parameter_optimization(
  distance_matrix,
  mapping_max_iter = 1000,
  relative_epsilon,
  convergence_counter,
  scenario_name,
  N_min,
  N_max,
  k0_min,
  k0_max,
  c_repulsion_min,
  c_repulsion_max,
  cooling_rate_min,
  cooling_rate_max,
  num_samples = 20,
  max_cores = NULL,
  folds = 20,
  verbose = FALSE,
  write_files = FALSE,
  output_dir
)

Value

A data.frame containing the parameter sets and their performance metrics (Holdout_MAE and NLL). The columns of the data frame are N, k0, cooling_rate, c_repulsion, Holdout_MAE, and NLL. If write_files is TRUE, this data frame is also saved to a CSV file as a side effect.

Arguments

distance_matrix

Matrix or data frame. Input distance matrix. Must be square and symmetric. Can contain NA values for missing measurements.

mapping_max_iter

Integer. Maximum number of optimization iterations.

relative_epsilon

Numeric. Convergence threshold for relative change in error.

convergence_counter

Integer. Number of iterations below threshold before declaring convergence.

scenario_name

Character. Name for output files and job identification.

N_min, N_max

Integer. Range for number of dimensions parameter.

k0_min, k0_max

Numeric. Range for initial spring constant parameter.

c_repulsion_min, c_repulsion_max

Numeric. Range for repulsion constant parameter.

cooling_rate_min, cooling_rate_max

Numeric. Range for spring decay parameter.

num_samples

Integer. Number of LHS samples to generate (default: 20).

max_cores

Integer. Maximum number of cores to use for parallel processing. If NULL, uses all available cores minus 1 (default: NULL).

folds

Integer. Number of cross-validation folds. Default: 20.

verbose

Logical. Whether to print progress messages. Default: FALSE.

write_files

Logical. Whether to save results to CSV. Default: FALSE.

output_dir

Character. Directory where output files will be saved. Required if write_files is TRUE.

Details

The function performs these steps:

  1. Generates LHS samples in parameter space

  2. Creates k-fold splits of input data

  3. For each parameter set and fold:

    • Trains model on training set

    • Evaluates on validation set

    • Calculates MAE and negative log likelihood

  4. Computations are run locally in parallel.

Parameters ranges are transformed to log scale where appropriate to handle different scales effectively.

See Also

create_topolow_map for the core optimization algorithm

Examples

Run this code
# \donttest{
# This example is wrapped in \donttest{} because it can exceed 5 seconds,
# 1. Create a structured, synthetic dataset for the example
# Generate coordinates for a more realistic test case
synth_coords <- generate_complex_data(n_points = 20, n_dim = 3)
# Convert coordinates to a distance matrix
dist_mat <- coordinates_to_matrix(synth_coords)

# 2. Run the optimization on the synthetic data
# ensuring it passes CRAN's automated checks.
results <- initial_parameter_optimization(
  distance_matrix = dist_mat,
  mapping_max_iter = 100,
  relative_epsilon = 1e-3,
  convergence_counter = 2,
  scenario_name = "test_opt_synthetic",
  N_min = 2, N_max = 5,
  k0_min = 1, k0_max = 10,
  c_repulsion_min = 0.001, c_repulsion_max = 0.05,
  cooling_rate_min = 0.001, cooling_rate_max = 0.02,
  num_samples = 4,
  max_cores = 2,
  verbose = FALSE
)
print(results)
# }

Run the code above in your browser using DataLab