To identify the optimal dichotomizing predictors using repeated sample splits.
optimSplit_dichotom(
formula,
data,
include = quote(p1 > 0.15 & p1 < 0.85),
top = 1L,
nsplit,
...
)split_dichotom(y, x, id, ...)
splits_dichotom(y, x, ids = rSplit(y, ...), ...)
# S3 method for splits_dichotom
quantile(x, probs = 0.5, ...)
Function optimSplit_dichotom returns an object of class
'optimSplit_dichotom'
, which is a list of dichotomizing functions,
with the input formula
and data
as additional attributes.
formula, e.g., y~X
or y~x1+x2
.
Response \(y\) may be double, logical and Surv.
Candidate numeric predictors \(x\)'s may be specified as the columns of one matrix column, e.g., y~X
; or as several vector columns, e.g., y~x1+x2
.
In helper functions, x
is a numeric vector.
(optional) language, inclusion criteria.
Default (p1>.15 & p1<.85)
specifies a user-desired range of \(p_1\)
for the candidate dichotomizing predictors.
See explanation of \(p_1\) in section Returns of Helper Functions.
positive integer scalar, number of optimal dichotomizing predictors, default 1L
additional parameters for function rSplit
logical vector for helper function split_dichotom, indices of training (TRUE
) and test (FALSE
) subjects
(optional) list of logical vectors for helper function splits_dichotom, multiple copies of indices of repeated training-test sample splits.
double scalar for helper function quantile.splits_dichotom, see quantile
Helper function split_dichotom performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set. Specifically, given a training-test sample split,
find the dichotomizing rule \(\mathcal{D}\) of the predictor \(x_0\) given the response \(y_0\) in the training set (via rpartD);
fit a univariable regression model of the response \(y_1\) with the dichotomized predictor \(\mathcal{D}(x_1)\) in the test set.
Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.
Helper function splits_dichotom fits multiple split-dichotomized regression models split_dichotom on the response \(y\) and predictor \(x\), based on each copy of the repeated training-test sample splits.
Helper function quantile.splits_dichotom is a method dispatch of the S3 generic function quantile on splits_dichotom object. Specifically,
collect the univariable regression coefficient estimate from each one of the split-dichotomized regression models;
find the nearest-even (i.e., type = 3
) quantile of the coefficients from Step 1. By default, we use the median (i.e., prob = .5
);
the split-dichotomized regression model corresponding to the selected coefficient quantile in Step 2, is returned.
Helper function split_dichotom returns a split-dichotomized regression model, which is either a Cox proportional hazards (coxph), a logistic (glm), or a linear (lm) regression model, with additional attributes
attr(,'rule')
function, dichotomizing rule \(\mathcal{D}\) based on the training set
attr(,'text')
character scalar, human-friendly description of \(\mathcal{D}\)
attr(,'p1')
double scalar, \(p_1 = \text{Pr}(\mathcal{D}(x_1)=1)\)
attr(,'coef')
double scalar, univariable regression coefficient estimate of \(y_1\sim\mathcal{D}(x_1)\)
Helper function splits_dichotom returns a list of split-dichotomized regression models (split_dichotom).
Helper function quantile.splits_dichotom returns a split-dichotomized regression model (split_dichotom).
Function optimSplit_dichotom identifies the optimal dichotomizing predictors via repeated sample splits. Specifically,
Generate multiple, i.e., repeated, training-test sample splits (via rSplit)
For each candidate predictor \(x_i\), find the median-split-dichotomized regression model based on the repeated sample splits, see details in section Details on Helper Functions
Limit the selection of the candidate predictors \(x\)'s to a user-desired range of \(p_1\) of the split-dichotomized regression models, see explanations of \(p_1\) in section Returns of Helper Functions
Rank the candidate predictors \(x\)'s by the decreasing order of the absolute values of the regression coefficient estimate of the median-split-dichotomized regression models. On the top of this rank are the optimal dichotomizing predictors.