dataLongSubDist: Data Matrix and Weights for Discrete Subdistribution Hazard Models

Description

Generates the augmented data matrix and the weights required for discrete subdistribution hazard modeling with right censoring.

Usage

dataLongSubDist(dataSet, timeColumn, eventColumns, eventFocus,
timeAsFactor=TRUE)

Arguments

dataSet

Original data in short format. Must be of class "data.frame".

timeColumn

Character specifying the column name of the observed event times. It is required that the observed times are discrete (integer).

eventColumns

Character vector specifying the column names of the event indicators (excluding censoring events). It is required that a 0-1 coding is used for all events. The algorithm treats row sums of zero of all event columns as censored.

eventFocus

Column name of the event of interest (type 1 event).

timeAsFactor

Logical indicating whether time should be coded as a factor in the augmented data matrix. If FALSE, a numeric coding will be used.

Value

Data frame with additional column "subDistWeights". The latter column contains the weights that are needed for fitting a weighted binary regression model, as described in Berger et al. (2018). The weights are calculated by a life table estimator for the censoring event.

Details

This function sets up the augmented data matrix and the weights that are needed for weighted maximum likelihood (ML) estimation of the discrete subdistribution model proposed by Berger et al. (2018). The model is a discrete-time extension of the original subdistribution model proposed by Fine and Gray (1999).

References

Moritz Berger, Matthias Schmid, Thomas Welchowski, Steffen Schmitz-Valckenberg and Jan Beyersmann, (2018), Subdistribution Hazard Models for Competing Risks in Discrete Time, Biostatistics, Doi: 10.1093/biostatistics/kxy069

Jason P. Fine and Robert J. Gray, (1999), A proportional hazards model for the subdistribution of a competing risk, Journal of the American Statistical Association 94, pages 496-509.

Examples

Run this code

# NOT RUN {
# Example with unemployment data
library(Ecdat)
data(UnempDur)

# Generate subsample, reduce number of intervals to k = 5
SubUnempDur <- UnempDur [1:500, ]
SubUnempDur$time <- as.numeric(cut(SubUnempDur$spell, c(0,4,8,16,28)))

# Convert competing risks data to long format
# The event of interest is re-employment at full job
SubUnempDurLong <- dataLongSubDist (dataSet=SubUnempDur, timeColumn="time", 
eventColumns=c("censor1", "censor2", "censor3"), eventFocus="censor1")
head(SubUnempDurLong)

# Fit discrete subdistribution hazard model with logistic link function
logisticSubDistr <- glm(y ~ timeInt + ui + age + logwage,
                    family=binomial(), data = SubUnempDurLong, 
                    weights=SubUnempDurLong$subDistWeights)
summary(logisticSubDistr)

# }

Run the code above in your browser using DataLab