Fits an optimized integer risk score model using a heuristic algorithm. Returns an object of class "risk_mod".
risk_mod(
X,
y,
gamma = NULL,
beta = NULL,
weights = NULL,
n_train_runs = 1,
lambda0 = 0,
a = -10,
b = 10,
max_iters = 10000,
tol = 1e-05,
shuffle = TRUE,
seed = NULL,
method = "annealscore"
)An object of class "risk_mod" with the following attributes:
Final scalar value.
Vector of integer coefficients.
Logistic regression object of class "glm" (see stats::glm).
Input covariate matrix.
Input response vector.
Input weights.
Imput lambda0 value.
Dataframe displaying the nonzero integer coefficients (i.e. "points") of the risk score model.
Dataframe containing a column of possible scores and a column with each score's associated risk probability.
Input covariate matrix with dimension \(n \times p\); every row is an observation.
Numeric vector for the (binomial) response variable.
Starting value to rescale coefficients for prediction (optional).
Starting numeric vector with \(p\) coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
Numeric vector of length \(n\) with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
A positive integer representing the number of times to initialize and train the model, returning the run with the lowest objective function for the training data.
Penalty coefficient for L0 term (default: 0).
See cv_risk_mod() for lambda0 tuning.
Integer lower bound for coefficients (default: -10).
Integer upper bound for coefficients (default: 10).
Maximum number of iterations (default: 10000).
Tolerance for convergence (default: 1e-5).
Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
An integer that is used as argument by set.seed() for
offsetting the random number generator. Default is to not set a
particular randomization seed.
A string that specifies which method ("riskcd" or "annealscore") to run (default: "annealscore")
This function uses either a cyclical coordinate descent algorithm or simulated annealing algorithm to solve the following optimization problem.
$$\min_{\alpha,\beta} \quad \frac{1}{n} \sum_{i=1}^{n} (\gamma y_i x_i^T \beta - log(1 + exp(\gamma x_i^T \beta))) + \lambda_0 \sum_{j=1}^{p} 1(\beta_{j} \neq 0)$$
$$l \le \beta_j \le u \; \; \; \forall j = 1,2,...,p$$ $$\beta_j \in \mathbb{Z} \; \; \; \forall j = 1,2,...,p $$ $$\beta_0, \gamma \in \mathbb{R}$$
These constraints ensure that the model will be sparse and include only integer coefficients.
y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])
mod1 <- risk_mod(X, y)
mod1$model_card
mod2 <- risk_mod(X, y, lambda0 = 0.01,)
mod2$model_card
mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5, method = "riskcd")
mod3$model_card
Run the code above in your browser using DataLab