PEAXAI_fitting: Training Classification Models to Estimate Efficiency

Description

Trains one or multiple classification algorithms to identify Pareto-efficient decision-making units (DMUs). It jointly searches model hyperparameters and the class-balancing level (synthetic samples via SMOTE) using k-fold cross- validation or a train/validation/test split, selecting the configuration that maximizes the specified metric(s). Returns, for each technique, the best fitted model together with training summaries, performance metrics, and the selected balancing level.

Usage

PEAXAI_fitting(
  data,
  x,
  y,
  RTS = "vrs",
  imbalance_rate = NULL,
  trControl,
  methods,
  metric_priority = "Balanced_Accuracy",
  hold_out = NULL,
  seed = NULL,
  verbose = TRUE
)

Value

A "PEAXAI" (list) with the best technique, best fitted models and their performance and the results by fold.

Arguments

data

A data.frame or matrix containing the variables in the model.

x

Integer vector with column indices of input variables in data.

y

Integer vector with column indices of output variables in data.

RTS

Text string or number defining the underlying DEA technology / returns-to-scale assumption (default: "vrs"). Accepted values:

0 / "fdh": Free disposability hull, no convexity assumption.

1 / "vrs"

Variable returns to scale, convexity and free disposability.

2 / "drs"

Decreasing returns to scale, convexity, down-scaling and free disposability.

3 / "crs"

Constant returns to scale, convexity and free disposability.

4 / "irs"

Increasing returns to scale (up-scaling, not down-scaling), convexity and free disposability.

5 / "add"

Additivity (scaling up and down, but only with integers), and free disposability.

imbalance_rate

Optional target(s) for class balance via SMOTE. If NULL, no synthetic balancing is performed.

trControl

A caret::trainControl-like list that specifies the resampling strategy; recognized values for $method include "cv", "test_set", and "none". See caret documentation.

methods

A list of selected machine learning models and their hyperparameters.

metric_priority

A string specifying the summary metric for classification to select the optimal model. Default includes "Balanced_Accuracy" due to (normally) unbalanced data.

hold_out

Numeric proportion in (0,1) for validation split (default NULL). If NULL, training and validation use the same data.

seed

Integer. Seed for reproducibility.

verbose

Logical; if TRUE, prints progress messages (default FALSE).

Examples

Run this code

# \donttest{
  data("firms", package = "PEAXAI")

  data <- subset(
    firms,
    autonomous_community == "Comunidad Valenciana"
  )

  trControl <- list(
    method = "cv",
    number = 3
  )

  # glm method
  methods <- list(
    "glm" = list(
        weights = "dinamic"
     )
  )

  models <- PEAXAI_fitting(
    data = data,
    x = c(1:4),
    y = 5,
    RTS = "vrs",
    imbalance_rate = NULL,
    methods = methods,
    trControl = trControl,
    metric_priority = c("Balanced_Accuracy", "ROC_AUC"),
    seed = 1,
    verbose = FALSE
  )
# }

Run the code above in your browser using DataLab