Learn R Programming

Colossus

The goal of Colossus is to provide an open-source means of performing survival analysis on big data with complex risk formula. Colossus is designed to perform Cox Proportional Hazard regressions and Poisson regressions on datasets loaded as data.tables or data.frames. The risk models allowed are sums or products of linear, log-linear, or several other radiation dose response formula highlighted in the vignettes. Additional plotting capabilities are available.

By default a fully portable version of the code is compiled, which does not support OpenMP on every system. Note that Colossus requires OpenMP support to perform parallel calculations. The environment variable “R_COLOSSUS_NOT_CRAN” is checked to determine if OpenMP should be disabled for linux compiling with clang. The number of cores is set to 1 if the environment variable is empty, the operating system is detected as linux, and the default compiler or R compiler is clang. Colossus testing checks for the “NOT_CRAN” variable to determine if additional tests should be run. Setting “NOT_CRAN” to “false” will disable the longer tests. Currently OpenMP support is not configured for linux compiling with clang.

Example

This is a basic example which shows you how to solve a common problem:

library(data.table)
library(parallel)
library(Colossus)
## basic example code reproduced from the starting-description vignette

df <- data.table(
  "UserID" = c(112, 114, 213, 214, 115, 116, 117),
  "Starting_Age" = c(18, 20, 18, 19, 21, 20, 18),
  "Ending_Age" = c(30, 45, 57, 47, 36, 60, 55),
  "Cancer_Status" = c(0, 0, 1, 0, 1, 0, 0),
  "a" = c(0, 1, 1, 0, 1, 0, 1),
  "b" = c(1, 1.1, 2.1, 2, 0.1, 1, 0.2),
  "c" = c(10, 11, 10, 11, 12, 9, 11),
  "d" = c(0, 0, 0, 1, 1, 1, 1)
)
# For the interval case
time1 <- "Starting_Age"
time2 <- "Ending_Age"
event <- "Cancer_Status"

names <- c("a", "b", "c", "d")
term_n <- c(0, 1, 1, 2)
tform <- c("loglin", "lin", "lin", "plin")
modelform <- "M"

a_n <- c(0.1, 0.1, 0.1, 0.1)

keep_constant <- c(0, 0, 0, 0)
der_iden <- 0

control <- list(
  "lr" = 0.75, "maxiter" = 100, "halfmax" = 5, "epsilon" = 1e-9,
  "deriv_epsilon" = 1e-9, "abs_max" = 1.0,
  "verbose" = FALSE, "ties" = "breslow"
)

e <- RunCoxRegression(df, time1, time2, event, names, term_n, tform, keep_constant, a_n, modelform, control = control)
Interpret_Output(e)
#> |-------------------------------------------------------------------|
#> Final Results
#>    Covariate Subterm Term Number Central Estimate Standard Deviation
#>       <char>  <char>       <int>            <num>              <num>
#> 1:         a  loglin           0         44.53340       9.490627e+07
#> 2:         b     lin           1         98.72266                NaN
#> 3:         c     lin           1         96.82311       2.408255e+02
#> 4:         d    plin           2        101.10000       5.207003e+02
#> 
#> Cox Model Used
#> -2*Log-Likelihood: 1.35,  AIC: 9.35
#> Iterations run: 100
#> maximum step size: 1.00e+00, maximum first derivative: 1.92e-04
#> Analysis did not converge, check convergence criteria or run further
#> Run finished in 0.25 seconds
#> |-------------------------------------------------------------------|

Copy Link

Version

Install

install.packages('Colossus')

Monthly Downloads

279

Version

1.2

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Eric Giunta

Last Published

February 13th, 2025

Functions in Colossus (1.2)

Def_modelform_fix

Automatically assigns geometric-mixture values and checks that a valid modelform is used
Joint_Multiple_Events

Automates creating data for a joint competing risks analysis
Event_Count_Gen

uses a table, list of categories, and list of event summaries to generate person-count tables
Linked_Dose_Formula

Calculates Full Parameter list for Special Dose Formula
Def_model_control

Automatically assigns missing model control values
Gather_Guesses_CPP

Performs checks to gather a list of guesses and iterations
Likelihood_Ratio_Test

Defines the likelihood ratio test
GetCensWeight

Calculates and returns data for time by hazard and survival to estimate censoring rate
Interpret_Output

Prints a regression output clearly
Event_Time_Gen

uses a table, list of categories, list of summaries, list of events, and person-year information to generate person-time tables
Rcomp_version

Checks how R was compiled
RunCoxNull

Performs basic Cox Proportional Hazards regression with the null model
Replace_Missing

Automatically assigns missing values in listed columns
Linked_Lin_Exp_Para

Calculates The Additional Parameter For a linear-exponential formula with known maximum
RunCoxPlots

Performs Cox Proportional Hazard model plots
PoissonCurveSolver

Calculates the likelihood curve for a poisson model directly
Rcpp_version

Checks default R c++ compiler
Model_Results_Log

Saves information about a run to a log file
OMP_Check

Checks the OMP flag
RunCoxRegression

Performs basic Cox Proportional Hazards regression without special options
RunCoxRegression_Strata

Performs basic Cox Proportional Hazards regression with strata effect
RunCoxRegression_Omnibus_Multidose

Performs Cox Proportional Hazards regression using the omnibus function with multiple column realizations
RunCoxRegression_Single

Performs basic Cox Proportional Hazards calculation with no derivative
RunPoissonEventAssignment

Predicts how many events are due to baseline vs excess
RunPoissonEventAssignment_bound

Predicts how many events are due to baseline vs excess at the confidence bounds of a single parameter
RunCoxRegression_Basic

Performs basic Cox Proportional Hazards regression with a multiplicative log-linear model
RunCoxRegression_CR

Performs basic Cox Proportional Hazards regression with competing risks
RunPoissonRegression_Omnibus

Performs basic Poisson regression using the omnibus function
RunPoissonRegression_Single

Performs poisson regression with no derivative calculations
RunPoissonRegression_Joint_Omnibus

Performs joint Poisson regression using the omnibus function
RunPoissonRegression_Residual

Calculates poisson residuals
RunPoissonRegression

Performs basic poisson regression
RunPoissonRegression_Tier_Guesses

Performs basic poisson regression, with multiple guesses, starts with a single term
Time_Since

Automates creating a date since a reference column
RunPoissonRegression_Guesses_CPP

Performs basic Poisson regression, generates multiple starting guesses on c++ side
RunPoissonRegression_Strata

Performs poisson regression with strata effect
RunCoxRegression_Tier_Guesses

Performs basic cox regression, with multiple guesses, starts with solving for a single term
System_Version

Checks OS, compilers, and OMP
factorize_par

Splits a parameter into factors in parallel
gcc_version

Checks default c++ compiler
gen_time_dep

Applies time dependence to parameters
RunCoxRegression_Guesses_CPP

Performs basic Cox Proportional Hazards regression, Generates multiple starting guesses on c++ side
factorize

Splits a parameter into factors
RunCoxRegression_Omnibus

Performs Cox Proportional Hazards regression using the omnibus function
interact_them

Defines Interactions
get_os

Checks system OS
Check_Verbose

General purpose verbosity check
Correct_Formula_Order

Corrects the order of terms/formula/etc
Def_Control

Automatically assigns missing control values
Date_Shift

Automates creating a date difference column
CoxCurveSolver

Calculates the likelihood curve for a cox model directly
Def_Control_Guess

Automatically assigns missing guessing control values
Check_Trunc

Applies time duration truncation limits to create columns for Cox model
Convert_Model_Eq

Converts a string equation to regression model inputs
Check_Dupe_Columns

checks for duplicated column names
Cox_Relative_Risk

Calculates hazard ratios for a reference vector