Learn R Programming

BiostatsUHNplus

The goal of BiostatsUHNplus is to house publicly available code snippets and functions (some with multiple package dependencies) used by Biostatistics@UHN in Toronto, Canada.

Many of these functions build upon the features of reportRmd.

Installation

First, install the latest version of reportRmd from CRAN with:

install.packages(c("reportRmd"), dependencies=TRUE);

Then install the latest version of BiostatsUHNplus from CRAN with:

install.packages(c("BiostatsUHNplus"), dependencies=TRUE);

If version 0.1.0 or higher of reportRmd is installed, the development version of BiostatsUHNplus can be installed from GitHub with:

# install.packages("devtools")
devtools::install_github("biostatsPMH/BiostatsUHNplus", ref="development")

Documentation

Online Documentation

Examples

A wrapper for the as.numeric function. Prints entries that fail to parse instead of throwing uninformative error

library(BiostatsUHNplus);
#> Warning in check_dep_version(): ABI version mismatch: 
#> lme4 was built with Matrix ABI version 1
#> Current Matrix ABI version is 0
#> Please re-install lme4 from source or restore original 'Matrix' package
z <- as_numeric_parse(c(1:5, "String1",6:10,"String2"))
#> The following entries were converted to NA values:
#> Entry 6, 'String1'
#> Entry 12, 'String2'
z
#>  [1]  1  2  3  4  5 NA  6  7  8  9 10 NA

Nested summary of adverse events by participant in cohort, stratified by attribution to first study drug

Uses addendum simulated study data and applies variable labels. Interpret summary output and unnested or nested p-value with caution!

Note that if participants were enrolled in more than one cohort (crossover), or if repeat AEs in participant had different attribution, the total N for Full Sample will be less than that of the total N of attribution for first study drug. Since total N for Full Sample (234) is less than total N of the first study drug attribution categories (49 + 198), this suggests that there was instances of repeat AEs in participants having different attribution to first study drug.

library(plyr);
library(BiostatsUHNplus);

data("enrollment", "demography", "ineligibility", "ae");
clinT <- plyr::join_all(list(enrollment, demography, ineligibility, ae), 
  by = "Subject", type = "full");
clinT$AE_SEV_GD <- as.numeric(clinT$AE_SEV_GD);
clinT$Drug_1_Attribution <- "Unrelated";
clinT$Drug_1_Attribution[clinT$CTC_AE_ATTR_SCALE %in% 
                           c("Definite", "Probable", "Possible")] <- "Related";
clinT$Drug_2_Attribution <- "Unrelated";
clinT$Drug_2_Attribution[clinT$CTC_AE_ATTR_SCALE_1 %in% 
                           c("Definite", "Probable", "Possible")] <- "Related";
lbls <- data.frame(c1=c("AE_SEV_GD", "ENROL_DATE_INT", "COHORT", "GENDER_CODE", 
  "INELIGIBILITY_STATUS", "AE_ONSET_DT_INT", "Drug_2_Attribution", "ae_category"),
  c2=c("Adverse event severity grade", "Enrollment date", "Cohort", "Gender", 
       "Ineligibility", "Adverse event onset date", "Attribution to second study drug",
       "Adverse event system organ class"));
clinT <- reportRmd::set_labels(clinT, lbls);

rm_covsum_nested(data = clinT, id = c("ae_detail", "Subject", "COHORT"), 
  covs = c("COHORT", "GENDER_CODE", "INELIGIBILITY_STATUS", "ENROL_DATE_INT", 
           "AE_SEV_GD", "Drug_2_Attribution", "AE_ONSET_DT_INT", "ae_category"), 
  maincov = "Drug_1_Attribution", nicenames = TRUE);

Outputs three DSMB-CCRU AE summary tables in Excel format per UHN template

Uses addendum simulated study data. DSMB-CCRU AE summary tables in below code example can be found in the man/tables folder of BiostatsUHNplus package.

library(BiostatsUHNplus);
data("enrollment", "demography", "ineligibility", "ae");

## This does summary for all participants;
dsmb_ccru(protocol="EXAMPLE_STUDY",setwd="./man/tables/",
          title="Phase X Study to Evaluate Treatments A-D",
          comp=NULL,pi="Dr. Principal Investigator",
          presDate="30OCT2020",cutDate="31AUG2020",
          boundDate=NULL,subjID="Subject",subjID_ineligText=c("New Subject","Test"),
          baseline_datasets=list(enrollment,demography,ineligibility),
          ae_dataset=ae,ineligVar="INELIGIBILITY_STATUS",ineligVarText=c("Yes","Y"),
          genderVar="GENDER_CODE",enrolDtVar="ENROL_DATE_INT",ae_detailVar="ae_detail",
          ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
          ae_onsetDtVar="AE_ONSET_DT_INT",ae_detailOtherText="Other, specify",
          ae_detailOtherVar="CTCAE5_LLT_NM",ae_verbatimVar="AE_VERBATIM_TRM_TXT",
          numSubj=NULL)

## This does summary for each cohort;
dsmb_ccru(protocol="EXAMPLE_STUDY",setwd="./man/tables/",
          title="Phase X Study to Evaluate Treatments A-D",
          comp="COHORT",pi="Dr. Principal Investigator",
          presDate="30OCT2020",cutDate="31AUG2020",
          boundDate=NULL,subjID="Subject",subjID_ineligText=c("New Subject","Test"),
          baseline_datasets=list(enrollment,demography,ineligibility),
          ae_dataset=ae,ineligVar="INELIGIBILITY_STATUS",ineligVarText=c("Yes","Y"),
          genderVar="GENDER_CODE",enrolDtVar="ENROL_DATE_INT",ae_detailVar="ae_detail",
          ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
          ae_onsetDtVar="AE_ONSET_DT_INT",ae_detailOtherText="Other, specify",
          ae_detailOtherVar="CTCAE5_LLT_NM",ae_verbatimVar="AE_VERBATIM_TRM_TXT",
          numSubj=NULL)

## Does same as above, but overrides number of subjects in cohorts;
dsmb_ccru(protocol="EXAMPLE_STUDY",setwd="./man/tables/",
          title="Phase X Study to Evaluate Treatments A-D",
          comp="COHORT",pi="Dr. Principal Investigator",
          presDate="30OCT2020",cutDate="31AUG2020",
          boundDate=NULL,subjID="Subject",subjID_ineligText=c("New Subject","Test"),
          baseline_datasets=list(enrollment,demography,ineligibility),
          ae_dataset=ae,ineligVar="INELIGIBILITY_STATUS",ineligVarText=c("Yes","Y"),
          genderVar="GENDER_CODE",enrolDtVar="ENROL_DATE_INT",ae_detailVar="ae_detail",
          ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
          ae_onsetDtVar="AE_ONSET_DT_INT",ae_detailOtherText="Other, specify",
          ae_detailOtherVar="CTCAE5_LLT_NM",ae_verbatimVar="AE_VERBATIM_TRM_TXT",
          numSubj=c(3,4,3,3))

Related adverse event onset timeline plots

Example 1

Uses addendum simulated study data. Shows timeline for onset of related AE after study enrollment. Can display up to 5 attributions. Time unit may be one of day, week, month or year.

Note that if more than one field is given for startDtVars (unique names required), each field is assumed to be specific start date for attribution in corresponding field order.

The below plot includes both AE category and AE detail in default colour and font scheme.

library(ggplot2);
library(BiostatsUHNplus);
data("enrollment", "ae");

p <- ae_timeline_plot(subjID="Subject",subjID_ineligText=c("New Subject","Test"),
                 baseline_datasets=list(enrollment),
                 ae_dataset=ae,
                 ae_attribVars=c("CTC_AE_ATTR_SCALE","CTC_AE_ATTR_SCALE_1"),
                 ae_attribVarsName=c("Drug 1","Drug 2"),
                 ae_attribVarText=c("Definite", "Probable", "Possible"),
                 startDtVars=c("ENROL_DATE_INT"),ae_detailVar="ae_detail",
                 ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
                 ae_onsetDtVar="AE_ONSET_DT_INT",time_unit="week")
ggplot2::ggsave(paste("man/figures/ae_detail_timeline_plot", ".png", sep=""), p, 
                width=6.4, height=10, device="png", scale = 1.15);

Example 2

The next plot summarizes timeline by AE category. Fonts, colours, symbols, column widths (character length) and time unit are customized.

The width, height and scale parameters in ggsave() can also be modified to fit a large plot.

library(ggplot2);
library(BiostatsUHNplus);
data("enrollment", "ae");

p <- ae_timeline_plot(subjID="Subject",subjID_ineligText=c("New Subject","Test"),
                 baseline_datasets=list(enrollment),
                 ae_dataset=ae,
                 ae_attribVars=c("CTC_AE_ATTR_SCALE","CTC_AE_ATTR_SCALE_1"),
                 ae_attribVarsName=c("Drug 1","Drug 2"),
                 ae_attribVarText=c("Definite", "Probable", "Possible"),
                 startDtVars=c("ENROL_DATE_INT"),ae_detailVar="ae_detail",
                 ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
                 ae_onsetDtVar="AE_ONSET_DT_INT",time_unit="month",
                 include_ae_detail=FALSE,
                 fonts=c("Forte","Gadugi","French Script MT","Albany AMT","Calibri"),
                 fontColours=c("#FF4F00","#FFDB58"),
                 panelColours=c("#AAF0D1",NA,"white"),
                 attribColours=c("#F6ADC6","#C54B8C","#A4DDED","#0077BE","#9AB973",
                                 "#01796F","#FFA343","#CC7722","#E0B0FF","#5A4FCF"),
                 attribSymbols=c(5,6,7,8,15,16,17,18,19,20),
                 columnWidths=c(23,15))
ggplot2::ggsave(paste("man/figures/ae_category_timeline_plot", ".png", sep=""), p, 
                width=3.6, height=5.4, device="png", scale = 1);

Example 3

If available, specific start date for attribution in corresponding field order (unique field name required) can be used. Drug start date is closer to AE onset than enrollment date.

Below, subjects 01 and 11 are excluded.

library(ggplot2);
library(BiostatsUHNplus);
data("drug1_admin", "drug2_admin", "ae");

p <- ae_timeline_plot(subjID="Subject",subjID_ineligText=c("01","11"),
                 baseline_datasets=list(drug1_admin, drug2_admin),
                 ae_dataset=ae,
                 ae_attribVars=c("CTC_AE_ATTR_SCALE","CTC_AE_ATTR_SCALE_1"),
                 ae_attribVarsName=c("Drug 1","Drug 2"),
                 ae_attribVarText=c("Definite", "Probable", "Possible"),
                 startDtVars=c("TX1_DATE_INT","TX2_DATE_INT"),
                 ae_detailVar="ae_detail",
                 ae_categoryVar="ae_category",ae_severityVar="AE_SEV_GD",
                 ae_onsetDtVar="AE_ONSET_DT_INT",time_unit="month",
                 include_ae_detail=FALSE,
                 fonts=c("Calibri","Albany AMT","Gadugi","French Script MT","Forte"),
                 fontColours=c("#FFE135"),
                 panelColours=c("#E52B50",NA,"#FFF5EE"),
                 attribColours=c("#9AB973","#01796F","#FFA343","#CC7722"),   
                 attribSymbols=c(7,8,5,6),
                 columnWidths=c(23))
ggplot2::ggsave(paste("man/figures/ae_category_attribStart_timeline_plot", ".png", sep=""), 
                p, width=4.2, height=5.4, device="png", scale = 1);

Summary functions for MCMCglmm object with binary outcome

Model output for fixed effects

Below runs a logistic MCMCglmm model on the odds of grade 3 or higher adverse event, controlling for attribution of first and second intervention drugs. Subject and system organ class are treated as random effects. Model has 800 posterior samples. Should specify burnin=125000, nitt=625000 and thin=100 for 5000 posterior samples with lower autocorrelation. Aim for effective sample sizes of at least 2000.

data("ae");

ae$G3Plus <- 0;
ae$G3Plus[ae$AE_SEV_GD %in% c("3", "4", "5")] <- 1;
ae$Drug_1_Attribution <- "No";
ae$Drug_1_Attribution[ae$CTC_AE_ATTR_SCALE %in% c("Definite", "Probable", "Possible")] <- "Yes";
ae$Drug_2_Attribution <- "No";
ae$Drug_2_Attribution[ae$CTC_AE_ATTR_SCALE_1 %in% c("Definite", "Probable", "Possible")] <- "Yes";

prior2RE <- list(R = list(V = diag(1), fix = 1),
  G=list(G1=list(V=1, nu=0.02), G2=list(V=1, nu=0.02)));
  
model1 <- MCMCglmm::MCMCglmm(G3Plus ~ Drug_1_Attribution + Drug_2_Attribution, 
  random=~Subject + ae_category, family="categorical", data=ae, saveX=TRUE, 
  verbose=F, burnin=4000, nitt=12000, thin=10, pr=TRUE, prior=prior2RE);

mcmcglmm_mva <- nice_mcmcglmm(model1, ae);
options(knitr.kable.NA = '');
knitr::kable(mcmcglmm_mva);
VariableLevelsOR (95% HPDI)MCMCpeff.samp
Drug 1 AttributionNoreference
Yes2.72 (1.01, 6.53)0.028184.77
Drug 2 AttributionNoreference
Yes0.43 (0.17, 1.38)0.125155.01

Intraclass correlation coefficients

Most of the observed variation in grade 3 or higher adverse event status is attributable to adverse event category, also known as system organ class.

mcmcglmm_icc <- nice_mcmcglmm_icc(model1, prob=0.95, decimals=4);
options(knitr.kable.NA = '');
knitr::kable(mcmcglmm_icc);
ICClowerupper
Subject0.04380.00900.3070
ae_category0.87340.53720.9497
units0.06630.02280.2286

Caterpillar plots of random effects - participant

After controlling for first and second drug attributions, subject 01 has a higher odds for grade 3 or higher adverse event than the average of study participants.

p <- caterpillar_plot(subjID = "Subject",
  mcmcglmm_object = model1,
  prob = 0.95,
  orig_dataset = ae,
  ncol = 2,
  binaryOutcomeVar = "G3Plus")
ggplot2::ggsave(paste("man/figures/caterpillar_plot_subject", ".png", sep=""), 
       p, scale = 1.0, width=6.4, height=3.4, device="png");

Caterpillar plots of random effects - system organ class

Highest posterior density intervals, also known as credible intervals, are not symmetric. Need to run model for more iterations with higher burnin.

p <- caterpillar_plot(subjID = "ae_category",
  mcmcglmm_object = model1,
  prob = 0.95,
  orig_dataset = ae,
  ncol = 4,
  columnTextWidth = 22,
  binaryOutcomeVar = "G3Plus",
  subtitle = "System organ class (n, events)",
  title = "Odds Ratio for G3+ Severity with 95% Highest Posterior Density Interval",
  fonts = c("Arial", "Arial", "Arial", "Arial"),
  break.label.summary = TRUE)
ggplot2::ggsave(paste("man/figures/caterpillar_plot_ae_category", ".png", sep=""), 
       p, scale = 1.3, width=6.4, height=3.8, device="png");

Copy Link

Version

Install

install.packages('BiostatsUHNplus')

Monthly Downloads

234

Version

1.0.2

License

MIT + file LICENSE

Maintainer

Tyler Pittman

Last Published

February 3rd, 2025

Functions in BiostatsUHNplus (1.0.2)

drug1_admin

Simulated study agent 1 for patients.
covsum_nested

Nested version of reportRmd covsum()
ae

Simulated adverse events for patients receiving two study agents.
enrollment

Enrollment data Simulated enrollment for patients.
ae_timeline_plot

Outputs related adverse event timeline plots including just system organ class (AE category), or system organ class and lowest level term (AE detail). This function can fit up to 5 different attributions. Modify width, height and scale parameters in ggsave() to customize fit for large plot.
dsmb_ccru

Outputs the three DSMB-CCRU AE summary tables in Excel format per UHN template
demography

Simulated demography for patients.
drug2_admin

Simulated study agent 2 for patients.
as_numeric_parse

Modification of the as.numeric function that prints entries that fail to parse as a message
caterpillar_plot

Caterpillar plot. Useful for plotting random effects from hierarchical models, such as MCMCglmm::MCMCglmm() object, that have binary outcome.
nice_mcmcglmm

Nice table of model output from MCMCglmm::MCMCglmm()
nice_mcmcglmm_icc

Nice table of intraclass correlation coefficients from MCMCglmm::MCMCglmm() model output
ineligibility

Simulated ineligibility for patients.
rm_covsum_nested

Outputs a nested version of reportRmd::rm_covsum()
redcap_data_out

Combines exported REDCap raw and label .csv files together with data dictionary. Tranforms the exported data into Excel sheets by survey instrument with one row per participant