simsl: Single-index models with a surface-link (main function)

Description

simsl is the wrapper function for fitting a single-index model with a surface-link (SIMSL). The function estimates a linear combination (a single-index) of baseline covariates X, and models a nonlinear interactive structure between the single-index and a treatment variable defined on a continuum, by estimating a smooth link function on the index-treatment domain. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Usage

simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE,
  bootstrap = FALSE, nboot = 200, boot.conf = 0.95, seed = 1357)

Arguments

a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by mgcv::gam; y can also be an ordinal categorial response with R categories taking a value from 1 to R.

a n-by-1 vector of treatment variable; each element is assumed to take a value on a continuum.

a n-by-p matrix of baseline covarates.

a n-by-q design matrix associated with an X main effect model; the defult is NULL and it is taken as a vector of zeros

family

specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by mgcv::gam; can also be "ordinal", for an ordinal categorical response y.

the number of response categories for the case of family = "ordinal".

basis type for the treatment (A) and single-index domains, respectively; the defult is "ps" (p-splines); any basis supported by mgcv::gam can be used, e.g., "cr" (cubic regression splines); see mgcv::s for detail.

basis dimension for the treatment (A) and single-index domains, respectively.

a length 2 list (e.g., m=list(c(2,3), c(2,2))), for the treatment (A) and single-index domains, respectively, where each element specifies the order of basis and penalty (note, for bs="ps", c(2,3) means a 2nd order P-spline basis (cubic spline) and a 3rd order difference penalty; the default "NA" sets c(2,2) for each domain); see mgcv::s for details.

a vector of smoothing parameters; Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula (i.e., A, and then the single-index); negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible; see mgcv::gam for detail.

knots

a list containing user-specified knot values to be used for basis construction, for the treatment (A) and single-index domains, respectively.

sep.A.effect

If TRUE, the g term of SIMSL is further decomposed into: the A main effect + the A-by-X interaction effect; the default is FALSE.

a length 2 vector indicating which marginals (i.e., A and the single-index, respectively) should have centering (i.e., the sum-to-zero) constraints applied; the default is mc = c(TRUE, FALSE) (see mgcv::te for detail of the constraint), which is sufficient for the so-called "orthogonality" constraint of the SIMSL.

method

the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by mgcv::gam can be used.

beta.ini

an initial value for beta.coef; a p-by-1 vector; the defult is NULL, in which case a linear model estimate is used.

ind.to.be.positive

for identifiability of the solution beta.coef, the user can restrict the jth (e.g., j=1) component of beta.coef to be positive; by default, we match the "overall" sign of beta.coef with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.

random.effect

if TRUE, as part of the main effects, the user can incorporate z-specific random intercepts.

a factor that specifies the random intercepts when random.effect = TRUE.

gamma

increase this beyond 1 to produce smoother models. gamma multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see mgcv::gam for detail); the default is 1.

pen.order

0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of beta.coef.

lambda

a regularization parameter associated with the penalized LS for beta.coef update.

max.iter

an integer specifying the maximum number of iterations for beta.coef update.

eps.iter

a value specifying the convergence criterion of algorithm.

trace.iter

if TRUE, trace the estimation process and print the differences in beta.coef.

center.X

if TRUE, center X to have zero mean.

scale.X

if TRUE, scale X to have unit variance.

uncons.final.fit

if TRUE, once the convergence in the estimates of beta.coef is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final surface-link estimate.

bootstrap

if TRUE, compute bootstrap confidence intervals for the single-index coefficients, beta.coef; the default is FALSE.

nboot

when bootstrap=TRUE, a value specifying the number of bootstrap replications.

boot.conf

a value specifying the confidence level of the bootstrap confidence intervals; the defult is boot.conf = 0.95.

seed

when bootstrap=TRUE, randomization seed used in bootstrap resampling.

Value

a list of information of the fitted SIMSL including

beta.coef

the estimated single-index coefficients.

g.fit

a mgcv:gam object containing information about the estimated 2-dimensional link function.

beta.ini

the initial value used in the estimation of beta.coef

beta.path

solution path of beta.coef over the iterations

d.beta

records the change in beta.coef over the solution path, beta.path

X.scale

sd of pretreatment covariates X

X.center

mean of pretreatment covariates X

A.range

range of the observed treatment variable A

number of baseline covariates X

number of subjects

boot.ci

boot.conf-level bootstrap CIs (LB, UB) associated with beta.coef

boot.mat

a (nboot x p) matrix of bootstrap estimates of beta.coef

Details

SIMSL captures the effect of covariates via a single-index and their interaction with the treatment via a 2-dimensional smooth link function. Interaction effects are determined by shapes of the link surface. The SIMSL allows comparing different individual treatment levels and constructing individual treatment rules, as functions of a biomarker signature (single-index), efficiently utilizing information on patient<U+2019>s characteristics. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Examples

Run this code

# NOT RUN {
set.seed(1234)
n.test <- 500


## simulation 1
# generate training data
p <- 30
n <- 200
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
D_opt <- 1 + 0.5*X[,2] + 0.5*X[,1]
mean.fn <- function(X, D_opt, A){ 8 + 4*X[,1] - 2*X[,2] - 2*X[,3] - 25*((D_opt-A)^2) }
mu <-   mean.fn(X, D_opt, A)
y <- rnorm(length(mu),mu,1)
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=X)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- 1 + 0.5*X.test[,2] + 0.5*X.test[,1]
pred <- pred.simsl(simsl.obj, newX= X.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*X.test[,1] - 2*X.test[,2] - 2*X.test[,3] - 25*((f_opt.test- pred$trt.rule)^2))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.


## simulation 2
p <- 10
n <- 400
# generate training data
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
f_opt <- I(X[,1] > -0.5)*I(X[,1] < 0.5)*0.6 + 1.2*I(X[,1] > 0.5) +
 1.2*I(X[,1] < -0.5) + X[,4]^2 + 0.5*log(abs(X[,7])+1) - 0.6
mu <-   8 + 4*cos(2*pi*X[,2]) - 2*X[,4] - 8*X[,5]^3 - 15*abs(f_opt-A)
y  <- rnorm(length(mu),mu,1)
Xq <- cbind(X, X^2)  # include a quadratic term
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=Xq)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- I(X.test[,1] > -0.5)*I(X.test[,1] < 0.5)*0.6 + 1.2*I(X.test[,1] > 0.5) +
 1.2*I(X.test[,1] < -0.5) + X.test[,4]^2 + 0.5*log(abs(X.test[,7])+1) - 0.6
Xq.test <- cbind(X.test, X.test^2)
pred <- pred.simsl(simsl.obj, newX= Xq.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*cos(2*pi*X.test[,2]) - 2*X.test[,4] - 8*X.test[,5]^3 -
              15*abs(f_opt.test-pred$trt.rule))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.


# }
# NOT RUN {
 ### air pollution data application
 data(chicago); head(chicago)
 chicago <- chicago[,-3][complete.cases(chicago[,-3]), ]
 chicago <- chicago[-c(2856:2859), ]  # get rid of the gross outliers in y
 chicago <- chicago[-which.max(chicago$pm10median), ]  # get rid of the gross outliers in x

 # create lagged variables
 lagard <- function(x,n.lag=5) {
   n <- length(x); X <- matrix(NA,n,n.lag)
   for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
   X
 }
 chicago$pm10 <- lagard(chicago$pm10median)
 chicago <- chicago[complete.cases(chicago), ]
 # create season varaible
 chicago$time.day <- round(chicago$time %%  365)

 # fit SIMSL for modeling the season-by-pm10 interactions on their effects on outcomes
 simsl.obj <- simsl(y=chicago$death, A=chicago$time.day, X=chicago[,7], bs=c("cc","ps"),
                    ind.to.be.positive = 1, family="poisson", method = "REML",
                    bootstrap =FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef  # the estimated single-index coefficients
 summary(simsl.obj$g.fit)
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=-135, phi = 30,
               color="heat", se=2,ylab = "single-index", zlab = " ",
               main=expression(paste("Interaction surface g")))



 ### Warfarin data application
 data(warfarin)
 X <- warfarin$X
 A <- warfarin$A
 y <- -abs(warfarin$INR - 2.5)  # the target INR is 2.5
 X[,1:3] <- scale(X[,1:3]) # standardize continuous variables

 # Estimate the main effect, using an additive model
 mu.fit <- mgcv::gam(y-mean(y)  ~ X[, 4:13] +
                       s(X[,1], k=5, bs="ps")+
                       s(X[,2], k=5, bs="ps") +
                       s(X[,3], k=5, bs="ps"), method="REML")
 summary(mu.fit)
 mu.hat <- predict(mu.fit)
 # fit SIMSL
 simsl.obj <- simsl(y, A, X, Xm= mu.hat, scale.X = FALSE, center.X=FALSE, method="REML",
                    bootstrap = FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=52, phi = 18,
               color="heat", se=2, ylab = "single-index", zlab = "Y",
               main=expression(paste("Interaction surface g")))
# }
# NOT RUN {
# }