Learn R Programming

frailtyROC (version 1.0.0)

LungCancer: NCCTG Lung Cancer Marker Data

Description

This dataset contains four columns: id, the identifier for health institutions (clusters); marker, the risk score; time, the observed follow-up time; and status, the event indicator for subjects in the NCCTG lung cancer marker dataset.

Usage

data(LungCancer)

Arguments

Format

This is a data frame with 238 observations and the following 4 variables.

id

health institutions code

time

time to death in days

status

censoring indicator; 1=censored, 2=dead

marker

risk score derived from the observed data using frailty model

Details

The NCCTG lung cancer dataset was collected from 228 patients across 18 different healthcare institutions. The number of subjects per institution ranged from 2 to 36. For the final analysis, only 226 patients with complete records were included. The dataset contains survival times along with several important predictor variables, including: sex (coded as Male = 1, Female = 2), age (in years), ph.ecog (Eastern Cooperative Oncology Group performance status, assessed by a physician on a scale from 0 [asymptomatic] to 5 [dead]), and pat.karno (Karnofsky performance status, assessed by the patient). The marker (risk score) was derived from three predictor variables: sex, age, and ph.ecog. To this end, a frailty model with gamma-distributed frailty was fitted. As in the previous example, the prognostic marker is defined as: \hat{\nu} \exp(\hat{\beta}_1 \cdot \text{sex} + \hat{\beta}_2 \cdot \text{age} + \hat{\beta}_3 \cdot \text{ph.ecog})where \hat{\nu} is the estimated frailty term, and \hat{\beta}_i(for i=1,2,3) are the estimated regression coefficients from the frailty model.

References

Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.