ipw_strata: Inverse Probability weighting of strata (two or more strata, survival or binary endpoint)

Description

This function performs inverse probability weighting of two or more strata. It could be used when arm1 has 2 or more strata, while stratum information is unknown in arm2. The function will fit a logistic regression (when 2 classes) or multinomial logistic regression (when > 2 classes) based on stratum labels in arm1 (model: label ~ features), then predict stratum labels for pts in arm2 based on the fitted model (as well as pts in arm1 who have missing labels, if there is any). The predicted probability of being stratum X will be used as weights when estimating treatment difference of two arms (Hazard ratio for survival endpoint; response rate difference for binary endpoint)

Usage

ipw_strata(
  data.in,
  formula,
  indicator.var = "indicator",
  class.of.int = NULL,
  tte = "AVAL",
  event = "event",
  trt = "trt",
  response = NULL,
  model = "plain",
  indicator.next = NULL,
  weights = NULL,
  multinom.maxit = 100,
  return.data = TRUE
)

Arguments

data.in

(data.frame) input data

formula

(formula) to input to the logistic or multinomial logistic model (in the form of strata~features)

indicator.var

(string) column name of the strata indicator variable which must be numeric. Assume arm1 has strata labeling and arm2 does not have strata labeling. pts without strata labeling should be indicated as -1 (e.g. pts in the arm1, or pts in arm2 but with missing label). within arm1 (the arm with strata labeling), subclasss should be indicated as 0,1,2...

class.of.int

(list) classes (stratum) of interest. Request to be in list format. It could be subset of classes in arm1; it could also define combined classes. For example: class.of.int = list("class1"=0, "class2"=1, "class3"=2, "class2or3"="c(1,2)"). for "class2or3", Prob(class 2 or 3) will be calculated as Prob(class2) + Prob(class3)

tte

(string) column name of the time to event variable

event

(string) column name of the event variable (1: event, 0: censor)

trt

(string) column name of the treatment variable. The variable is expected to be in factor format and the first level will be considered as reference (control) arm when calculating summary statistics.

response

(string) column name of the response variable. 1 indicates responder and 0 indicates non responder. if response is not NULL, tte and event will be ignored and the function will assume binary outcome.

model

(string) one of (plain, dwc, wri).

"plain": when 2 levels are specified in indicator variable, a binomial glm will be fitted; when more than 2 levels are specified, a multinomial glm will be fitted;
"dwc": Doubly weighted control: Two separated models will be fitted: one is binomial glm of 2 vs. (1, 0), the other one is bionomial glm of 1 vs 0. The probability of being each class is then calculated by aggregating these two models. Note this is similar to the plain method but with different (less rigid) covariance assumptions.
"wri": Weight regression imputation: the current status is going to be learned from the next status. Indicator of the next status should be specified using indicator.next. Currently "wri" only support the case where there are only two non-missing strata. In indicator variable, the two nonmissing strata should be coded as 0 and 1, the missing group should be coded as 2.

indicator.next

(string) column name of the column which indicates status at a different measurement. It should be coded in the same way as in indicator.var (e.g. -1, 0, 1). Patients who have both missing current status and missing next status should be excluded in the modeling.

weights

(numeric) weights of each subject. If not NULL, the estimated probabilities will be reweightsed to ensure sum(probability) of a subject = the subject's weights. If weights is not NULL, quasibinomial model will be used.

multinom.maxit

see parameter maxit in nnet::multinom, default is 100

return.data

(logical) whether to return data with estimated probabilities.

Value

return a list containing the following components:

stat a matrix with rows as strata and columns as Estimate and CIs.
converged logical to indicate whether model converges.
any_warning_glm logical to indicate whether there's warning from glm model.
warning.msg a list to capture any warning message from the modeling process.
models a list to capture the glm model results.
roc.list a list to capture information about Area under the curve from glm model.
data a data.frame which is the original input data plus predicted probabilities.

Examples

Run this code

# NOT RUN {
 
 # example 1: Impute NA as one stratum in experimental arm; default model 
 library(dplyr)
 clinical_1 <- clinical %>% mutate( 
  indicator = case_when(
    STRATUM == "strata_1" ~ 0, 
    STRATUM == "strata_2" ~ 1,
    is.na(STRATUM) & ARM == "experimental" ~ 1,
    TRUE ~ -1 
  ),
  ARM  = factor(ARM, levels = c("control","experimental")),
  BNLR = case_when(
    is.na(BNLR) ~ median(BNLR, na.rm = TRUE),
    TRUE ~ BNLR
  )
)
ipw_res1 <- ipw_strata(
  data.in = clinical_1, formula = indicator ~ BECOG + SEX + BNLR,
  indicator.var = "indicator", tte = "OS_MONTH", event = "OS_EVENT", trt = "ARM",
  class.of.int = list("strata_1" = 1, "strata_2" = 0)
 )
 ## Weighted HRs
 ipw_res1$stat
 
 # example 2: "Weight regression imputation" model
clinical_2 <- clinical %>% mutate( 
  indicator = case_when(
    STRATUM == "strata_1" ~ 0, 
    STRATUM == "strata_2" ~ 1, 
    is.na(STRATUM) & ARM == "experimental" ~ 2,
    TRUE ~ -1
  ),
  indicator_next = case_when(
    STRATUM_NEXT == "strata_1" ~ 0, 
    STRATUM_NEXT == "strata_2" ~ 1, 
    is.na(STRATUM_NEXT) & ARM == "experimental" ~ 2,
    TRUE ~ -1
  ),
  ARM  = factor(ARM, levels = c("control","experimental")),
  BNLR = case_when(
    is.na(BNLR) ~ median(BNLR, na.rm = TRUE),
    TRUE ~ BNLR
  )
)

ipw_res2 <- ipw_strata(
  data.in = clinical_2, formula = indicator ~ BECOG + SEX + BNLR, model = "wri",
  indicator.var = "indicator", indicator.next = "indicator_next",
   tte = "OS_MONTH", event = "OS_EVENT", trt = "ARM",
  class.of.int = list("strata_1" = 1, "strata_2" = 0)
 )
 ## Weighted HRs
 ipw_res2$stat 
 
# }

Run the code above in your browser using DataLab