splitwise: SplitWise Regression

Description

Transforms each numeric variable into either a single-split dummy or keeps it linear, then runs stats::step() for stepwise selection. The user can choose a simpler univariate transformation or an iterative approach.

Usage

splitwise(
  formula,
  data,
  transformation_mode = c("iterative", "univariate"),
  direction = c("backward", "forward", "both"),
  min_support = 0.1,
  min_improvement = 3,
  criterion = c("AIC", "BIC"),
  exclude_vars = NULL,
  verbose = FALSE,
  steps = 1000,
  k = 2,
  ...
)
# S3 method for splitwise_lm
print(x, ...)
# S3 method for splitwise_lm
summary(object, ...)
# S3 method for splitwise_lm
predict(object, newdata, ...)
# S3 method for splitwise_lm
coef(object, ...)
# S3 method for splitwise_lm
fitted(object, ...)
# S3 method for splitwise_lm
residuals(object, ...)
# S3 method for splitwise_lm
model.matrix(object, ...)

Value

An S3 object of class c("splitwise_lm", "lm"), storing:

splitwise_info: List containing transformation decisions, final data, and call.

Arguments

formula: A formula specifying the response and (initial) predictors, e.g. mpg ~ ..
data: A data frame containing the variables used in formula.
transformation_mode: Either "iterative" or "univariate". Default = "iterative".
direction: Stepwise direction: "backward", "forward", or "both".
min_support: Minimum fraction (between 0 and 0.5) of observations needed in either group when making a dummy split. Prevents over-fragmented or tiny dummy groups. Default = 0.1.
min_improvement: Minimum required improvement (in AIC/BIC units) for accepting a dummy split or variable transformation. Helps guard against overfitting from marginal improvements. Default = 2.
criterion: Either "AIC" or "BIC". Default = "AIC". Note: If you choose "BIC", you typically want k = log(nrow(data)) in stepwise.
exclude_vars: A character vector naming variables that should be forced to remain linear (i.e., no dummy splits allowed). Default = NULL.
verbose: Logical; if TRUE, prints debug info in transformation steps. If FALSE, the stepwise selection process is run quietly (trace = 0 in step()). Default = FALSE.
steps: Maximum number of steps for step(). Default = 1000.
k: Penalty multiple for the number of degrees of freedom (used by step()). E.g. 2 for AIC, log(n) for BIC. Default = 2.
...: Additional arguments passed to predict.lm.
x: A "splitwise_lm" object returned by splitwise.
object: An object of class splitwise_lm, as returned by splitwise.
newdata: A data frame of new data (with original predictors) to generate predictions for. The appropriate dummy variables will be generated using the transformation rules learned during model training. If omitted, predictions for the training data are returned.

Functions

print(splitwise_lm): Prints a summary of the splitwise_lm object.
summary(splitwise_lm): Provides a detailed summary, including how dummies were created.
predict(splitwise_lm): Generate predictions from a splitwise_lm object using learned transformation rules.
coef(splitwise_lm): Extract model coefficients from a SplitWise linear model.
fitted(splitwise_lm): Extract fitted values from a SplitWise linear model.
residuals(splitwise_lm): Extract residuals from a SplitWise linear model.
model.matrix(splitwise_lm): Extract the model matrix from a SplitWise linear model.

Examples

Run this code

# Load the mtcars dataset
data(mtcars)

# Univariate transformations (AIC-based, backward stepwise)
model_uni <- splitwise(
  mpg ~ .,
  data               = mtcars,
  transformation_mode = "univariate",
  direction           = "backward"
)
summary(model_uni)

# Iterative approach (BIC-based, forward stepwise)
# Note: typically set k = log(nrow(mtcars)) for BIC in step().
model_iter <- splitwise(
  mpg ~ .,
  data               = mtcars,
  transformation_mode = "iterative",
  direction           = "forward",
  criterion           = "BIC",
  k                   = log(nrow(mtcars))
)
summary(model_iter)

Run the code above in your browser using DataLab