optimSplit_dichotom: Optimal Dichotomizing Predictors via Repeated Sample Splits

Description

To identify the optimal dichotomizing predictors using repeated sample splits.

Usage

optimSplit_dichotom(
  formula,
  data,
  include = quote(p1 > 0.15 & p1 < 0.85),
  top = 1L,
  nsplit,
  ...
)
split_dichotom(y, x, id, ...)
splits_dichotom(y, x, ids = rSplit(y, ...), ...)
# S3 method for splits_dichotom
quantile(x, probs = 0.5, ...)

Value

Function optimSplit_dichotom returns an object of class

'optimSplit_dichotom', which is a list of dichotomizing functions, with the input formula and data as additional attributes.

Arguments

formula, y, x: formula, e.g., y~X or y~x1+x2. Response \(y\) may be double, logical and Surv. Candidate numeric predictors \(x\)'s may be specified as the columns of one matrix column, e.g., y~X; or as several vector columns, e.g., y~x1+x2. In helper functions, x is a numeric vector.
data: data.frame
include: (optional) language, inclusion criteria. Default (p1>.15 & p1<.85) specifies a user-desired range of \(p_1\) for the candidate dichotomizing predictors. See explanation of \(p_1\) in section Returns of Helper Functions.
top: positive integer scalar, number of optimal dichotomizing predictors, default 1L
nsplit, ...: additional parameters for function rSplit
id: logical vector for helper function split_dichotom, indices of training (TRUE) and test (FALSE) subjects
ids: (optional) list of logical vectors for helper function splits_dichotom, multiple copies of indices of repeated training-test sample splits.
probs: double scalar for helper function quantile.splits_dichotom, see quantile

Details on Helper Functions

Split-Dichotomized Regression Model

Helper function split_dichotom performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set. Specifically, given a training-test sample split,

find the dichotomizing rule \(\mathcal{D}\) of the predictor \(x_0\) given the response \(y_0\) in the training set (via rpartD);
fit a univariable regression model of the response \(y_1\) with the dichotomized predictor \(\mathcal{D}(x_1)\) in the test set.

Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.

Split-Dichotomized Regression Models based on Repeated Training-Test Sample Splits

Helper function splits_dichotom fits multiple split-dichotomized regression models split_dichotom on the response \(y\) and predictor \(x\), based on each copy of the repeated training-test sample splits.

Quantile of Split-Dichotomized Regression Models

Helper function quantile.splits_dichotom is a method dispatch of the S3 generic function quantile on splits_dichotom object. Specifically,

collect the univariable regression coefficient estimate from each one of the split-dichotomized regression models;
find the nearest-even (i.e., type = 3) quantile of the coefficients from Step 1. By default, we use the median (i.e., prob = .5);
the split-dichotomized regression model corresponding to the selected coefficient quantile in Step 2, is returned.

Returns of Helper Functions

Helper function split_dichotom returns a split-dichotomized regression model, which is either a Cox proportional hazards (coxph), a logistic (glm), or a linear (lm) regression model, with additional attributes

attr(,'rule'): function, dichotomizing rule \(\mathcal{D}\) based on the training set
attr(,'text'): character scalar, human-friendly description of \(\mathcal{D}\)
attr(,'p1'): double scalar, \(p_1 = \text{Pr}(\mathcal{D}(x_1)=1)\)
attr(,'coef'): double scalar, univariable regression coefficient estimate of \(y_1\sim\mathcal{D}(x_1)\)

Helper function splits_dichotom returns a list of split-dichotomized regression models (split_dichotom).

Helper function quantile.splits_dichotom returns a split-dichotomized regression model (split_dichotom).

Details

Function optimSplit_dichotom identifies the optimal dichotomizing predictors via repeated sample splits. Specifically,

Generate multiple, i.e., repeated, training-test sample splits (via rSplit)
For each candidate predictor \(x_i\), find the median-split-dichotomized regression model based on the repeated sample splits, see details in section Details on Helper Functions
Limit the selection of the candidate predictors \(x\)'s to a user-desired range of \(p_1\) of the split-dichotomized regression models, see explanations of \(p_1\) in section Returns of Helper Functions
Rank the candidate predictors \(x\)'s by the decreasing order of the absolute values of the regression coefficient estimate of the median-split-dichotomized regression models. On the top of this rank are the optimal dichotomizing predictors.

Examples

Run this code

# see ?`Qindex-package`

Run the code above in your browser using DataLab