risk_mod: Fit an Integer Risk Score Model

Description

Fits an optimized integer risk score model using a heuristic algorithm. Returns an object of class "risk_mod".

Usage

risk_mod(
  X,
  y,
  gamma = NULL,
  beta = NULL,
  weights = NULL,
  n_train_runs = 1,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 10000,
  tol = 1e-05,
  shuffle = TRUE,
  seed = NULL,
  method = "annealscore"
)

Value

An object of class "risk_mod" with the following attributes:

gamma: Final scalar value.
beta: Vector of integer coefficients.
glm_mod: Logistic regression object of class "glm" (see stats::glm).
X: Input covariate matrix.
y: Input response vector.
weights: Input weights.
lambda0: Imput lambda0 value.
model_card: Dataframe displaying the nonzero integer coefficients (i.e. "points") of the risk score model.
score_map: Dataframe containing a column of possible scores and a column with each score's associated risk probability.

Arguments

X: Input covariate matrix with dimension $n \times p$; every row is an observation.
y: Numeric vector for the (binomial) response variable.
gamma: Starting value to rescale coefficients for prediction (optional).
beta: Starting numeric vector with $p$ coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
weights: Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
n_train_runs: A positive integer representing the number of times to initialize and train the model, returning the run with the lowest objective function for the training data.
lambda0: Penalty coefficient for L0 term (default: 0). See cv_risk_mod() for lambda0 tuning.
a: Integer lower bound for coefficients (default: -10).
b: Integer upper bound for coefficients (default: 10).
max_iters: Maximum number of iterations (default: 10000).
tol: Tolerance for convergence (default: 1e-5).
shuffle: Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
seed: An integer that is used as argument by set.seed() for offsetting the random number generator. Default is to not set a particular randomization seed.
method: A string that specifies which method ("riskcd" or "annealscore") to run (default: "annealscore")

Details

This function uses either a cyclical coordinate descent algorithm or simulated annealing algorithm to solve the following optimization problem.

$$\min_{\alpha,\beta} \quad \frac{1}{n} \sum_{i=1}^{n} (\gamma y_i x_i^T \beta - log(1 + exp(\gamma x_i^T \beta))) + \lambda_0 \sum_{j=1}^{p} 1(\beta_{j} \neq 0)$$

$$l \le \beta_j \le u \; \; \; \forall j = 1,2,...,p$$ $$\beta_j \in \mathbb{Z} \; \; \; \forall j = 1,2,...,p $$ $$\beta_0, \gamma \in \mathbb{R}$$

These constraints ensure that the model will be sparse and include only integer coefficients.

Examples

Run this code

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod1 <- risk_mod(X, y)
mod1$model_card

mod2 <- risk_mod(X, y, lambda0 = 0.01,)
mod2$model_card

mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5, method = "riskcd")
mod3$model_card

Run the code above in your browser using DataLab