Learn R Programming

simaerep

Simulate subject-level event reporting of clinical trial sites with the goal of detecting over- and under-reporting.

Monitoring reporting rates of subject-level clinical events (e.g. adverse events, protocol deviations) reported by clinical trial sites is an important aspect of risk-based quality monitoring strategy. Sites that are under-reporting or over-reporting events can be detected using bootstrap simulations during which patients are redistributed between sites. Site-specific distributions of event reporting rates are generated that are used to assign probabilities to the observed reporting rates.

The method is inspired by the ‘infer’ R package and Allen Downey’s blog article: “There is only one test!”.

Installation

CRAN

install.packages("simaerep")

Development Version

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("openpharma/simaerep")

IMPALA

simaerep has been published as workproduct of the Inter-Company Quality Analytics (IMPALA) consortium. IMPALA aims to engage with Health Authorities inspectors on defining guiding principles for the use of advanced analytics to complement, enhance and accelerate current QA practices. simaerep has initially been developed at Roche but is currently evaluated by other companies across the industry to complement their quality assurance activities (see testimonials).

Publications

Koneswarakantha, B., Adyanthaya, R., Emerson, J. et al. An Open-Source R Package for Detection of Adverse Events Under-Reporting in Clinical Trials: Implementation and Validation by the IMPALA (Inter coMPany quALity Analytics) Consortium. Ther Innov Regul Sci 58, 591–599 (2024). https://doi.org/10.1007/s43441-024-00631-8

Koneswarakantha, B., Barmaz, Y., Ménard, T. et al. Follow-up on the Use of Advanced Analytics for Clinical Quality Assurance: Bootstrap Resampling to Enhance Detection of Adverse Event Under-Reporting. Drug Saf (2020). https://doi.org/10.1007/s40264-020-01011-5

Resources

Validation Report

Download as pdf in the release section generated using thevalidatoR.

{gsm.simaerep}

We have created an extension gsm.simaerep so that simaerep event reporting probabilities can be added to good statistical monitoring gsm.core reports.

Application

Calculate patient-level event reporting probabilities and the difference to the expected number of events on a simulated data set with 2 under-reporting sites.


suppressPackageStartupMessages(library(simaerep))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(knitr))

set.seed(1)

df_visit <- sim_test_data_study(
  n_pat = 1000, # number of patients in study
  n_sites = 100, # number of sites in study
  ratio_out = 0.02, # ratio of sites with outlier
  factor_event_rate = -0.5, # rate of under-reporting
  # non-constant event rates based on gamma distribution
  event_rates = (dgamma(seq(1, 20, 0.5), shape = 5, rate = 2) * 5) + 0.1,
  max_visit = 20,
  max_visit_sd = 10,
  study_id = "A"
)

df_visit %>%
  select(study_id, site_id, patient_id, visit, n_event) %>%
  head(25) %>%
  knitr::kable()
study_idsite_idpatient_idvisitn_event
AS0001P00000110
AS0001P00000122
AS0001P00000132
AS0001P00000144
AS0001P00000156
AS0001P00000167
AS0001P00000177
AS0001P00000187
AS0001P00000197
AS0001P000001107
AS0001P000001117
AS0001P000001127
AS0001P000001137
AS0001P00000213
AS0001P00000223
AS0001P00000235
AS0001P00000248
AS0001P00000258
AS0001P00000269
AS0001P00000279
AS0001P00000289
AS0001P00000299
AS0001P000002109
AS0001P000002119
AS0001P000002129


evrep <- simaerep(df_visit, mult_corr = TRUE)

plot(evrep, study = "A")

Left panel shows mean cumulative event reporting per site (blue lines) against mean cumulative event reporting of the entire study (golden line). Sites with either high under-reporting (negative probabilities) or high over-reporting (positive probabilities) are marked by grey dots and plotted in additional panels on the right. N denotes the number of sites. Right panel shows individual sites with total patient cumulative counts as grey lines. N denotes the number of patients, the percentage the under- and over-reporting probability and delta denotes the difference compared to the expected number of events.

In Database Calculation

The inframe algorithm uses only dbplyr compatible table operations and can be executed within a database backend as we demonstrate here using duckdb.

However, we need to provide a in database table that has as many rows as the desired replications in our simulation, instead of providing an integer for the r parameter.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")
df_r <- tibble(rep = seq(1, 1000))

dplyr::copy_to(con, df_visit, "visit")
dplyr::copy_to(con, df_r, "r")

tbl_visit <- tbl(con, "visit")
tbl_r <- tbl(con, "r")

evrep <- simaerep(
  tbl_visit,
  r = tbl_r
)

plot(evrep, study = "A")

DBI::dbDisconnect(con)

Copy Link

Version

Install

install.packages('simaerep')

Monthly Downloads

245

Version

1.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Bjoern Koneswarakantha

Last Published

October 28th, 2025

Functions in simaerep (1.0.0)

purrr_bar

Execute a purrr or furrr function with a progress bar.
get_site_mean_ae_dev

Get site mean ae development.
get_visit_med75

Get visit_med75.
poiss_test_site_ae_vs_study_ae

Poisson test for vector with site AEs vs vector with study AEs.
plot_visit_med75

Plot patient visits against visit_med75.
prep_for_sim

Prepare data for simulation.
%>%

Pipe operator
simaerep

Create simaerep object
prob_lower_site_ae_vs_study_ae

Calculate bootstrapped probability for obtaining a lower site mean AE number.
sim_test_data_study

simulate study test data
sim_out

simulate under-reporting
print.simaerep

Print method for simaerep objects
pat_pool

Create a study specific patient pool for sampling
plot_study

Plot ae development of study and sites highlighting at risk sites.
sim_after_prep

Start simulation after preparation.
prune_to_visit_med75_inframe

prune visits to visit_med75 using table operations
sim_pat

simulate patients and events for sites supports constant and non-constant event rates
sim_inframe

Calculate prob for study sites using table operations
site_aggr

Aggregate from visit to site level.
with_progress_cnd

plot_sim_examples

Plot multiple simulation examples.
print.orivisit

Print method for orivisit objects
plot_sim_example

Plot simulation example.
remap_col_names

renames internal simaerep col_names to externally applied colnames
sim_test_data_portfolio

Simulate Portfolio Test Data
sim_sites

Calculate prob_lower and poisson.test pvalue for study sites.
sim_test_data_patient

simulate patient event reporting test data
sim_test_data_events

simulate test data events
get_portf_config

Get Portfolio Configuration
get_cum_mean_event_dev

Get cumulative mean event development
eval_sites

Evaluate sites.
get_df_visit_test_mapped

Get df_visit_test mapped
get_df_visit_test

Get df_visit_test
get_legend

replace cowplot::get_legend, to silence warning Multiple components found; returning the first one. To return all, use `return_all = TRUE
exp_implicit_missing_visits

Expose implicitly missing visits.
aggr_duplicated_visits

Aggregate duplicated visits.
get_portf_event_rates

Get Portfolio Event Rates Calculates mean event rates per study and visit in a df_visit simaerep input dataframe.
check_df_visit

Integrity check for df_visit.
p_adjust_bh_inframe

benjamini hochberg p value correction using table operations
is_orivisit

is orivisit class
is_simaerep

is simaerep class
pat_aggr

Aggregate visit to patient level.
max_rank

Calculate Max Rank
orivisit

create orivisit object
plot.simaerep

plot AE under-reporting simulation results
plot_dots

Plots AE per site as dots.