Learn R Programming

vecmatch (version 1.2.0)

select_opt: Select Optimal Parameter Combinations from Optimization Results

Description

select_opt() is a helper function to filter and prioritize results from optimize_gps() based on the specific goals of a study. Depending on the research design, certain pairwise comparisons or treatment groups may be more important than others. For example:

  • You may want to prioritize matching between a specific groups (e.g. specific disease vs. controls), while ignoring other group comparisons during SMD evaluation.

  • You may wish to retain as many samples as possible from a critical group or set of groups, regardless of matching rates in other groups.

This function enables targeted selection of optimal parameter combinations by:

  • Evaluating SMDs for specific pairwise treatment group comparisons,

  • Selecting key covariates to assess balance,

  • Prioritizing matched sample size in selected treatment groups.

By combining these criteria, select_opt() allows you to tailor the optimization output to your study's focus - whether it emphasizes covariate balance in targeted group comparisons or maximizing sample retention for specific subgroups.

Usage

select_opt(
  x,
  smd_groups = NULL,
  smd_variables = NULL,
  smd_type = c("mean", "max"),
  perc_matched = NULL
)

Value

An S3 object of class select_result, containing the filtered and prioritized optimization results. The object includes:

  • A data.frame with selected parameter combinations and performance metrics.

  • Attribute param_df: A data.frame with full parameter specifications (iter_ID, GPS/matching parameters, etc.), useful for manually refitting or reproducing results.

The object also includes a custom print() method that summarizes:

  • Number of selected combinations per SMD bin

  • Corresponding aggregated SMD (mean or max)

  • Overall or group-specific percentage matched

Arguments

x

An object of class best_opt_result, produced by the optimize_gps() function.

smd_groups

A list of pairwise comparisons (as character vectors of length 2) specifying which treatment group comparisons should be prioritized in SMD evaluation. Each element must be a valid pair of treatment levels. If NULL, all pairwise comparisons are used. Example: list(c("adenoma", "crc_malignant"), c("controls", "adenoma"))

smd_variables

A character vector of covariate names to include in the SMD evaluation. Must match variables listed in attr(x, "model_covs").

smd_type

A character string ("mean" or "max"), defining how to aggregate SMDs across covariates and comparisons. "max" selects combinations with the lowest maximum SMD; "mean" uses the average SMD.

perc_matched

A character vector of treatment levels for which the matching rate should be maximized. If NULL, overall perc_matched is used. If specified, only the sum of matching percentages for the listed groups is used for selection within each SMD category.

Details

Optimization results are grouped into bins based on the maximum SMD observed for each parameter combination. These bins follow the same structure as in optimize_gps():

  • 0.00-0.05

  • 0.05-0.10

  • 0.10-0.15

  • 0.15-0.20

  • 0.20-0.25

  • 0.25-0.30

  • 0.30-0.35

  • 0.35-0.40

  • 0.40-0.45

  • 0.45-0.50

  • more than 0.50

Within each bin, models are first filtered based on their aggregated SMD across the specified smd_groups and smd_variables, using the method defined by smd_type. Then, among the remaining models, the best-performing one(s) are selected based on the percentage of matched samples - either overall or in the specified treatment groups (perc_matched).

Examples

Run this code
# Define formula and set up optimization
formula_cancer <- formula(status ~ age * sex)
opt_args <- make_opt_args(cancer, formula_cancer, gps_method = "m1")
if (FALSE) {
withr::with_seed(8252, {
  opt_results <- optimize_gps(
    data = cancer,
    formula = formula_cancer,
    opt_args = opt_args,
    n_iter = 2000
  )
})
}
# Select optimal combinations prioritizing SMD balance and matching in key
# groups
if (FALSE) {
select_results <- select_opt(
  x = opt_results,
  smd_groups = list(
    c("adenoma", "controls"),
    c("controls", "crc_beningn"),
    c("crc_malignant", "controls")
  ),
  smd_variables = "age",
  smd_type = "max",
  perc_matched = c("adenoma", "crc_malignant")
)
}

Run the code above in your browser using DataLab