cchsData: Data from a case--cohort study with stratified subcohort-selection

Description

A case--cohort dataset where the subcohort was selected by stratified simple random sampling. This is an artificial dataset that was made from nwtco, a real dataset from the National Wilms Tumor Study (NWTS). It is designed for demonstrating the use of cchs.

Arguments

Format

id

An ID number.

localHistol

Result of the histology from the local institution.

centralLabHistol

Result of the histology from the central laboratory.

stage

Stage of the cancer (I, II, III, or IV).

study

The study (NWTS-3 or NWTS-4). For details see this NWTS webpage.

isCase

Indicator for whether this participant had a relapse or not.

time

Number of days from diagnosis of Wilms tumor to relapse or censoring.

ageAtDiagnosis

Age in years at diagnosis of Wilms tumor.

inSubcohort

Indicator for whether this participant is in the subcohort or not.

Source

 # Starting with nwtco, rename variables, convert some to factors, drop  
# in.subcohort (which is used elsewhere for a different simulated dataset), etc. 
library(survival, quietly=TRUE)
cchsData <- data.frame(
   id = nwtco$seqno, 
   localHistol = factor(nwtco$instit, labels=c("favorable", "unfavorable")), 
   centralLabHistol = factor(nwtco$histol, labels=c("favorable", "unfavorable")), 
   stage = factor(nwtco$stage, labels=c("I", "II", "III", "IV")), 
   study = factor(nwtco$study, labels=c("NWTS-3", "NWTS-4")),
   isCase = as.logical(nwtco$rel), 
   time = nwtco$edrel,
   ageAtDiagnosis = nwtco$age / 12  # nwtco$age is in months
) # Define the intended sampling fractions for the two strata. 
samplingFractions <- c(favorable=0.05, unfavorable=0.2) # Select participants/rows to be in the subcohort by stratified simple random 
# sampling. 
cchsData$inSubcohort <- rep(FALSE, nrow(cchsData))
set.seed(1)
for (stratumName in levels(cchsData$localHistol)) {
   inThisStratum <- cchsData$localHistol == stratumName
   stratumSubcohortSize <- 
         round(samplingFractions[stratumName] * sum(inThisStratum))
   rowsToSetTrue <- sample(which(inThisStratum), size=stratumSubcohortSize)
   cchsData$inSubcohort[rowsToSetTrue] <- TRUE
} # Change the sampling fractions to their exact values. 
stratumSubcohortSizes <- table(cchsData$localHistol[cchsData$inSubcohort])
stratumCohortSizes <- table(cchsData$localHistol)
samplingFractions <- stratumSubcohortSizes / stratumCohortSizes
samplingFractions <- c(samplingFractions)  # make it a vector, not a table # Keep only the cases and the subcohort. 
cchsData <- cchsData[cchsData$isCase | cchsData$inSubcohort,] # Put the sampling fraction in each row of the data-frame. 
cchsData$sampFrac <- 
      samplingFractions[match(cchsData$localHistol, names(samplingFractions))]

Details

The nwtco data is from two clinical trials but can be regarded as cohort data. cchsData can be created from it by running the code in the Source section below, which is partly based on the Examples section of the cch help page. Two strata are used for the subcohort-selection, corresponding to the two values of localHistol. The sampling fraction is 5% for the stratum defined by localHistol="favorable" and 20% for the stratum defined by localHistol="unfavorable". After the subcohort is selected, the sampling fractions are recalculated using the exact integer numbers of participants in the subcohort and the full cohort.