Learn R Programming

biostats

Overview

biostats is an R package that functions as a toolbox to aid in biostatistics and clinical data analysis tasks and workflows.

Key features

  • Descriptive statistics and exploratory data analysis
  • Sample size and power calculation
  • Statistical analysis and inference
  • Data visualization

Designed primarily for comparative clinical studies, trial planning, and analysis, this package serves both as an analytical toolkit for professional biostatisticians and clinical data analysts and as an educational resource for researchers transitioning to R-based biostatistics, including professionals from other domains, clinical research professionals, and medical practitioners involved in the development of clinical trials.

Developed by the biostatistics team at Laboratorios Sophia S.A. de C.V.

Installation

# Install latest CRAN release:
install.packages("biostats") 

# Or install developer version from GitHub:
#install.packages("pak")
pak::pak("sebasquirarte/biostats")

Usage

library(biostats)

This package comprises 14 functions across four analytical domains:

Descriptive Statistics and Exploratory Data Analysis (EDA)

clinical_data()

Description

Creates a simple simulated clinical trial dataset with subject demographics, multiple visits, treatment groups with different effects, numerical and categorical variables, as well as optional missing data and dropout rates.

Parameters
ParameterDescriptionDefault
nInteger indicating the number (1-999) of subjects.100
visitsInteger indicating the number of visits including baseline.3
armsCharacter vector of treatment arm names.c("Placebo", "Treatment")
dropoutNumeric parameter indicating the proportion (0-1) of subjects who dropout.0
missingNumeric parameter indicating the proportion (0-1) of missing values to be introduced across numeric variables with fixed proportions (biomarker = 15%, weight = 25%, response = 60%).0
Examples
# Simulate basic clinical data
clinical_df <- clinical_data()

str(clinical_df)
#> 'data.frame':    300 obs. of  8 variables:
#>  $ participant_id: chr  "001" "001" "001" "002" ...
#>  $ visit         : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
#>  $ sex           : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ treatment     : Factor w/ 2 levels "Placebo","Treatment": 2 2 2 1 1 1 1 1 1 1 ...
#>  $ age           : num  35 35 35 21 21 21 47 47 47 35 ...
#>  $ weight        : num  55.4 60.3 58.1 68.3 66.3 64 76 77.6 74.9 61.7 ...
#>  $ biomarker     : num  42.2 44.7 44.9 56.5 51 ...
#>  $ response      : Factor w/ 3 levels "Complete","Partial",..: 1 3 2 3 3 3 3 2 3 3 ...

head(clinical_df, 10)
#>    participant_id visit  sex treatment age weight biomarker response
#> 1             001     1 Male Treatment  35   55.4     42.22 Complete
#> 2             001     2 Male Treatment  35   60.3     44.70     None
#> 3             001     3 Male Treatment  35   58.1     44.85  Partial
#> 4             002     1 Male   Placebo  21   68.3     56.51     None
#> 5             002     2 Male   Placebo  21   66.3     51.03     None
#> 6             002     3 Male   Placebo  21   64.0     39.59     None
#> 7             003     1 Male   Placebo  47   76.0     24.92     None
#> 8             003     2 Male   Placebo  47   77.6     49.99  Partial
#> 9             003     3 Male   Placebo  47   74.9     60.69     None
#> 10            004     1 Male   Placebo  35   61.7     50.58     None
# Simulate more complex clinical data
clinical_df_full <- clinical_data(n = 300,
                                  visits = 10,
                                  arms = c('A', 'B', 'C'), 
                                  dropout = 0.10,
                                  missing = 0.05)

str(clinical_df_full)
#> 'data.frame':    3000 obs. of  8 variables:
#>  $ participant_id: chr  "001" "001" "001" "001" ...
#>  $ visit         : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
#>  $ sex           : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ treatment     : Factor w/ 3 levels "A","B","C": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ age           : num  25 25 25 25 25 25 25 25 25 25 ...
#>  $ weight        : num  64.7 65.1 64.2 62.3 62.1 NA NA 61.8 63.7 64.1 ...
#>  $ biomarker     : num  48.2 22.2 51.2 43.4 44.5 ...
#>  $ response      : Factor w/ 3 levels "Complete","Partial",..: 1 1 3 1 3 1 3 3 NA 1 ...

head(clinical_df_full, 20)
#>    participant_id visit    sex treatment age weight biomarker response
#> 1             001     1   Male         C  25   64.7     48.24 Complete
#> 2             001     2   Male         C  25   65.1     22.17 Complete
#> 3             001     3   Male         C  25   64.2     51.21     None
#> 4             001     4   Male         C  25   62.3     43.38 Complete
#> 5             001     5   Male         C  25   62.1     44.52     None
#> 6             001     6   Male         C  25     NA     24.25 Complete
#> 7             001     7   Male         C  25     NA     49.55     None
#> 8             001     8   Male         C  25   61.8     47.78     None
#> 9             001     9   Male         C  25   63.7     23.65     <NA>
#> 10            001    10   Male         C  25   64.1     45.97 Complete
#> 11            002     1 Female         B  72   71.1     34.18     None
#> 12            002     2 Female         B  72   70.7     65.47     None
#> 13            002     3 Female         B  72   71.2     34.29     None
#> 14            002     4 Female         B  72     NA        NA     <NA>
#> 15            002     5 Female         B  72     NA        NA     <NA>
#> 16            002     6 Female         B  72     NA        NA     <NA>
#> 17            002     7 Female         B  72     NA        NA     <NA>
#> 18            002     8 Female         B  72     NA        NA     <NA>
#> 19            002     9 Female         B  72     NA        NA     <NA>
#> 20            002    10 Female         B  72     NA        NA     <NA>

summary_table()

Description

Generates a summary table for biostatistics and clinical data analysis with automatic normality, effect size, and statistical test calculations. Handles both numeric and categorical variables, performing appropriate descriptive statistics and inferential tests for single-group summaries or two-group comparisons.

Parameters
ParameterDescriptionDefault
dataDataframe containing the variables to be summarized.Required
group_byCharacter string indicating the name of the grouping variable for two-group comparisons.NULL
normality_testCharacter string indicating the normality test to use: ‘S-W’ for Shapiro-Wilk or ‘K-S’ for Kolmogorov-Smirnov.'S-W'
allLogical parameter that shows all calculated statistics.FALSE
effect_sizeLogical parameter that includes effect size estimates.FALSE
excludeCharacter vector of variable names to exclude from the summary.NULL
Examples
# Overall summary without considering treatment groups
summary_table(clinical_df, exclude = c('participant_id', 'visit'))
# Grouped summary by treatment group
summary_table(clinical_df, group_by = 'treatment', exclude = c('participant_id', 'visit'))
# Grouped summary by treatment group with all stats and effect size
summary_table(clinical_df,
              group_by = 'treatment',
              all = TRUE,
              effect_size = TRUE,
              exclude = c('participant_id', 'visit'))

normality()

Description

Tests normality using sample size-appropriate methods: Shapiro-Wilk test (n less than or equal to 50) or Kolmogorov-Smirnov test with Lilliefors’ correction (n greater than 50) with Q-Q plots and histograms. Evaluates skewness and kurtosis using z-score criteria based on sample size. Automatically detects outliers and provides comprehensive visual and statistical assessment.

Parameters
ParameterDescriptionDefault
dataDataframe containing the variables to be summarized.Required
xCharacter string indicating the variable to be analyzed.Required
allLogical parameter that displays all row indices of values outside 95% CI.FALSE
colorCharacter string indicating color for plots."#79E1BE"
Examples
# Filter clinical data to Placebo arm
clinical_df_treat <- clinical_df[clinical_df$treatment == "Placebo", ]

# Normally distributed variable
normality(data = clinical_df_treat, "biomarker")
#> 
#> Normality Test for 'biomarker' 
#> 
#> n = 159 
#> mean (SD) = 49.44 (9.2) 
#> median (IQR) = 50.38 (13.1) 
#> 
#> Kolmogorov-Smirnov (Lilliefors): D = 0.054, p = 0.305 
#> Shapiro-Wilk: W = 0.992, p = 0.546 
#> Skewness: 0.06 (z = 0.30) 
#> Kurtosis: -0.03 (z = -0.08) 
#> 
#> Data appears normally distributed.
#> 

# Non-normally distributed variable with points outside 95% CI displayed
normality(data = clinical_df_treat, "weight", all = TRUE)
#> 
#> Normality Test for 'weight' 
#> 
#> n = 159 
#> mean (SD) = 72.56 (12.9) 
#> median (IQR) = 69.20 (21.1) 
#> 
#> Kolmogorov-Smirnov (Lilliefors): D = 0.125, p < 0.001 
#> Shapiro-Wilk: W = 0.951, p < 0.001 
#> Skewness: 0.28 (z = 1.45) 
#> Kurtosis: -1.09 (z = -2.85) 
#> 
#> Data appears not normally distributed.
#>  
#> VALUES OUTSIDE 95% CI (row indices): 40, 41, 47, 22, 3, 16, 71, 105, 125, 72, 90, 89, 129, 34, 93, 103, 69, 65, 59, 2, 66, 109, 114, 107, 110, 95, 111, 58, 70, 1, 106, 113, 152, 32, 112, 115, 57, 20, 84, 29, 142, 21, 55, 102, 143, 56, 86, 144, 83

missing_values()

Description

Provides descriptive statistics and visualizations of missing values in a dataframe.

Parameters
ParameterDescriptionDefault
dataDataframe containing the variables to be analyzed.Required
colorCharacter string indicating the color for missing values."#79E1BE"
allLogical parameter that shows all variables including those without missing values.FALSE
Examples
# Missing value analysis of only variables with missing values
missing_values(clinical_df_full)
#> 
#> Missing Value Analysis
#> 
#> Complete rows: 2452 (81.7%)
#> Missing cells: 868 (3.6%)
#> 
#>           n_missing pct_missing
#> response        403       13.43
#> weight          251        8.37
#> biomarker       214        7.13

# Show all variables including those without missing values
missing_values(clinical_df_full, all = TRUE)
#> 
#> Missing Value Analysis
#> 
#> Complete rows: 2452 (81.7%)
#> Missing cells: 868 (3.6%)
#> 
#>                n_missing pct_missing
#> response             403       13.43
#> weight               251        8.37
#> biomarker            214        7.13
#> participant_id         0        0.00
#> visit                  0        0.00
#> sex                    0        0.00
#> treatment              0        0.00
#> age                    0        0.00

outliers()

Description

Identifies outliers using Tukey’s interquartile range (IQR) method and provides descriptive statistics and visualizations for outlier assessment in numeric data.

Parameters
ParameterDescriptionDefault
dataDataframe containing the variables to be analyzed.Required
xCharacter string indicating the variable to be analyzed.Required
thresholdNumeric value multiplying the IQR to define outlier boundaries.1.5
colorCharacter string indicating the color for non-outlier data points."#79E1BE"
Examples
# Basic outlier detection
outliers(clinical_df_full, "biomarker")
#> 
#> Outlier Analysis
#> 
#> Variable: 'biomarker'
#> n: 2786
#> Missing: 214 (7.1%)
#> Method: Tukey's IQR x 1.5
#> Bounds: [18.971, 74.761]
#> Outliers detected: 19 (0.7%)
#> 
#> Outlier indices: 27, 223, 440, 559, 795, 931, 973, 1175, 1277, 1346, 1381, 1680, 1706, 2288, 2370, 2571, 2584, 2602, 2764

# Using custom threshold
outliers(clinical_df_full, "biomarker", threshold = 1.0)
#> 
#> Outlier Analysis
#> 
#> Variable: 'biomarker'
#> n: 2786
#> Missing: 214 (7.1%)
#> Method: Tukey's IQR x 1.0
#> Bounds: [25.945, 67.788]
#> Outliers detected: 115 (4.1%)
#> 
#> Outlier indices: 2, 6, 9, 24, 27, 38, 42, 47, 56, 130 (...)

Sample Size and Power Calculation

sample_size()

Description

Calculates the sample size needed in a clinical trial based on study design and statistical parameters using standard formulas for hypothesis testing (Chow, S. 2017).

Parameters
ParameterDescriptionDefault
sampleCharacter string indicating whether one or two samples need to be calculated. Options: "one-sample" or "two-sample".Required
designCharacter string indicating study design when sample = "two-sample". Options: "parallel" or "crossover".NULL (for one-sample tests)
outcomeCharacter string indicating the type of outcome variable. Options: "mean" or "proportion".Required
typeCharacter string indicating the type of hypothesis test. Options: "equality", "equivalence", "non-inferiority", or "superiority".Required
alphaNumeric parameter indicating the Type I error rate (significance level).0.05
betaNumeric parameter indicating the Type II error rate (1 - power).0.20
x1Numeric value of the mean or proportion for group 1 (treatment group).Required
x2Numeric value of the mean or proportion for group 2 (control group or reference value).Required
SDNumeric value indicating the standard deviation. Required for mean outcomes and crossover designs with proportion outcomes.NULL
deltaNumeric value indicating the margin of clinical interest. Required for non-equality tests. Must be negative for non-inferiority and positive for superiority/equivalence.NULL
dropoutNumeric value indicating the discontinuation rate expected in the study. Must be between 0 and 1.0
kNumeric value indicating the allocation ratio (n1/n2) for two-sample tests.1
Examples
# Two-sample parallel non-inferiority test for means with 10% expected dropout
sample_size(sample = 'two-sample', design = 'parallel', outcome = 'mean',
            type = 'non-inferiority', x1 = 5.0, x2 = 5.0, 
            SD = 0.1, delta = -0.05, k = 1, dropout = 0.1)
#> 
#> Sample Size Calculation
#> 
#> Test type: non-inferiority
#> Design: parallel, two-sample
#> Outcome: mean
#> Alpha (α): 0.050
#> Beta (β): 0.200
#> Power: 80.0%
#> 
#> Parameters:
#> x1 (treatment): 5.000
#> x2 (control/reference): 5.000
#> Difference (x1 - x2): 0.000
#> Standard Deviation (σ): 0.100
#> Allocation Ratio (k): 1.00
#> Delta (δ): -0.050
#> Dropout rate: 10.0%
#> 
#> Required Sample Size
#> n1 = 55
#> n2 = 55
#> Total = 110
#> 
#> Note: Sample size increased by 10.0% to account for potential dropouts.
# One-sample equivalence test for means
sample_size(sample = "one-sample", outcome = "mean", type = "equivalence",
            x1 = 0, x2 = 0, SD = 0.1, delta = 0.05)
#> 
#> Sample Size Calculation
#> 
#> Test type: equivalence
#> Design: one-sample
#> Outcome: mean
#> Alpha (α): 0.050
#> Beta (β): 0.200
#> Power: 80.0%
#> 
#> Parameters:
#> x1 (treatment): 0.000
#> x2 (control/reference): 0.000
#> Difference (x1 - x2): 0.000
#> Standard Deviation (σ): 0.100
#> Delta (δ): 0.050
#> 
#> Required Sample Size
#> n = 35
#> Total = 35

sample_size_range()

Description

Calculates required sample sizes for specified power levels (70%, 80%, 90%) across a range of treatment effect values (), while keeping the control group value () fixed. Internally calls and generates a plot to visualize how total sample size changes with varying .

Parameters
ParameterDescriptionDefault
x1_rangeNumeric vector of length 2 specifying the range of values to evaluate for the treatment group mean or proportion (x1).Required
x2Numeric value for the control group mean or proportion (reference value).Required
stepNumeric value indicating the step size to increment across the x1_range.0.1
...Additional arguments passed to sample_size(), such as sample, design, outcome, type, SD, alpha, etc.Required
Examples
# Two-sample parallel non-inferiority test for proportions with 10% dropout
result <- sample_size_range(x1_range = c(0.65, 0.75), x2 = 0.65, step = 0.01,
                            sample = "two-sample", design = "parallel", outcome = "proportion",
                            type = "non-inferiority", delta = -0.1, dropout = 0.1)

print(result)
#> 
#> Sample Size Range Analysis
#> 
#> Treatment range (x1): 0.650 to 0.660
#> Control/Reference (x2): 0.650
#> Step size: 0.010
#> 
#> 70% Power: Total n = 108 to 474
#> 80% Power: Total n = 144 to 622
#> 90% Power: Total n = 196 to 858
#> 
#> Sample size increased by 10.0% to account for potential dropouts.
result$data
powerx1x2diffn1n2total
700.650.650.00237237474
700.660.650.01194194388
700.670.650.02162162324
700.680.650.03137137274
700.690.650.04117117234
700.700.650.05102102204
700.710.650.068888176
700.720.650.077777154
700.730.650.086969138
700.740.650.096161122
700.750.650.105454108
800.650.650.00311311622
800.660.650.01255255510
800.670.650.02213213426
800.680.650.03180180360
800.690.650.04154154308
800.700.650.05134134268
800.710.650.06116116232
800.720.650.07102102204
800.730.650.089191182
800.740.650.098080160
800.750.650.107272144
900.650.650.00429429858
900.660.650.01352352704
900.670.650.02294294588
900.680.650.03249249498
900.690.650.04213213426
900.700.650.05184184368
900.710.650.06160160320
900.720.650.07141141282
900.730.650.08125125250
900.740.650.09110110220
900.750.650.109898196
# One-sample equivalence test for means
result <- sample_size_range(x1_range = c(-0.01, 0.01), x2 = 0, step = 0.005,
                            sample = "one-sample", outcome = "mean", type = "equivalence",
                            SD = 0.1, delta = 0.05, alpha = 0.05)

print(result)
#> 
#> Sample Size Range Analysis
#> 
#> Treatment range (x1): -0.010 to -0.005
#> Control/Reference (x2): 0.000
#> Step size: 0.005
#> 
#> 70% Power: Total n = 29 to 45
#> 80% Power: Total n = 35 to 54
#> 90% Power: Total n = 44 to 68
result$data
powerx1x2diffn1n2total
70-0.0100-0.010454545
70-0.0050-0.005363636
700.00000.000292929
700.00500.005363636
700.01000.010454545
80-0.0100-0.010545454
80-0.0050-0.005434343
800.00000.000353535
800.00500.005434343
800.01000.010545454
90-0.0100-0.010686868
90-0.0050-0.005545454
900.00000.000444444
900.00500.005545454
900.01000.010686868

Statistical Analysis and Inference

omnibus()

Description

Performs omnibus tests to evaluate overall differences between three or more groups. Automatically selects the appropriate statistical test based on data characteristics and assumption testing. Supports both independent groups and repeated measures designs. Tests include one-way ANOVA, repeated measures ANOVA, Kruskal-Wallis test, and Friedman test. Performs comprehensive assumption checking (normality, homogeneity of variance, sphericity) and post-hoc testing when significant results are detected.

Parameters
ParameterDescriptionDefault
dataDataframe containing the variables to be analyzed. Data must be in long format with one row per observation.Required
yCharacter string indicating the dependent variable (outcome).Required
xCharacter string indicating the independent variable (group or within-subject variable).Required
paired_byCharacter string indicating the source of repeated measurements. If provided, a repeated measures design is assumed. If NULL, independent groups design is assumed.NULL
alphaNumeric value indicating the significance level for hypothesis tests.0.05
p_methodCharacter string indicating the method for p-value adjustment in post-hoc multiple comparisons to control for Type I error inflation. Options: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "none"."holm"
na.actionCharacter string indicating the action to take if NAs are present ("na.omit" or "na.exclude")."na.omit"
Examples
# Compare numerical variable across treatments
omnibus(data = clinical_df_full, y = "biomarker", x = "treatment")
#> 
#> Omnibus Test: One-way ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Normality (Shapiro-Wilk Test):
#>   A: W = 0.9980, p = 0.321
#>   B: W = 0.9975, p = 0.237
#>   C: W = 0.9988, p = 0.733
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 1.3685, p = 0.504
#>   Effect size (Cramer's V) = 0.0151
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ treatment
#>   alpha: 0.05
#>   Result: significant (p = <0.001)
#> 
#> Post-hoc Multiple Comparisons
#> 
#>   Tukey Honest Significant Differences (alpha: 0.050):
#>   Comparison               Diff    Lower    Upper    p-adj
#>   --------------------------------------------------------- 
#>   B - A                  -3.178   -4.296   -2.060   <0.001*
#>   C - A                  -5.542   -6.618   -4.466   <0.001*
#>   C - B                  -2.364   -3.468   -1.259   <0.001*
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.214).
 
# Compare numerical variable changes across visits 
omnibus(y = "biomarker", x = "visit", data = clinical_df, paired_by = "participant_id")
#> 
#> Omnibus Test: Repeated measures ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Sphericity (Mauchly Test):
#>   W = 0.9881, p = 0.556
#>   Result: Sphericity assumed.
#> 
#>   Normality (Shapiro-Wilk Test):
#>   1: W = 0.9848, p = 0.309
#>   2: W = 0.9926, p = 0.861
#>   3: W = 0.9884, p = 0.536
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 0.5190, p = 0.771
#>   Effect size (Cramer's V) = 0.0294
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ visit + Error(participant_id/visit)
#>   alpha: 0.05
#>   Result: not significant (p = 0.609)
#> Post-hoc tests not performed (results not significant).
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.203).
# Filter simulated data to just one treatment
clinical_df_A <- clinical_df[clinical_df$treatment == "Treatment", ]

# Compare numerical variable changes across visits 
omnibus(y = "biomarker", x = "visit", data = clinical_df_A, paired_by = "participant_id")
#> 
#> Omnibus Test: Repeated measures ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Sphericity (Mauchly Test):
#>   W = 0.9825, p = 0.672
#>   Result: Sphericity assumed.
#> 
#>   Normality (Shapiro-Wilk Test):
#>   1: W = 0.9617, p = 0.125
#>   2: W = 0.9812, p = 0.642
#>   3: W = 0.9904, p = 0.964
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 0.9232, p = 0.630
#>   Effect size (Cramer's V) = 0.0572
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ visit + Error(participant_id/visit)
#>   alpha: 0.05
#>   Result: not significant (p = 0.233)
#> Post-hoc tests not performed (results not significant).
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.217).

effect_measures()

Description

Calculates measures of effect: Odds Ratio (OR), Risk Ratio (RR), and either Number Needed to Treat (NNT) or Number Needed to Harm (NNH).

Parameters
ParameterDescriptionDefault
exposed_eventNumeric value indicating the number of events in the exposed group.Required
exposed_no_eventNumeric value indicating the number of non-events in the exposed group.Required
unexposed_eventNumeric value indicating the number of events in the unexposed group.Required
unexposed_no_eventNumeric value indicating the number of non-events in the unexposed group.Required
alphaNumeric value between 0 and 1 specifying the alpha level for confidence intervals (CI).0.05
correctionLogical parameter that indicates whether a continuity correction (0.5) will be applied when any cell contains 0.TRUE
Examples
effect_measures(exposed_event = 15, 
                exposed_no_event = 85,
                unexposed_event = 5,
                unexposed_no_event = 95)
#> 
#> Odds/Risk Ratio Analysis
#> 
#> Contingency Table:
#>                 Event No Event      Sum
#> Exposed            15       85      100
#> Unexposed           5       95      100
#> Sum                20      180      200
#> 
#> Odds Ratio: 3.353 (95% CI: 1.169 - 9.616)
#> Risk Ratio: 3.000 (95% CI: 1.133 - 7.941)
#> 
#> Risk in exposed: 15.0%
#> Risk in unexposed: 5.0%
#> Absolute risk difference: 10.0%
#> Number needed to harm (NNH): 10.0
#> 
#> Note: Correction not applied (no zero values).

Data Visualization

plot_bar()

Description

Generates publication-ready bar plots with minimal code using ggplot2.

Parameters
ParameterDescriptionDefault
dataA data frame containing the variables to plot.Required
xCharacter string specifying the x-axis variable.Required
yCharacter string specifying the y-axis variable. If NULL, counts calculated automatically.NULL
groupCharacter string specifying the grouping variable for fill color.NULL
facetCharacter string specifying the faceting variable.NULL
positionCharacter string specifying bar position: "dodge", "stack", or "fill".Required
statCharacter string for statistical aggregation: "mean" or "median".Required
colorsCharacter vector of colors. If NULL, uses TealGrn palette.NULL
titleCharacter string for plot title.NULL
xlabCharacter string for x-axis label.NULL
ylabCharacter string for y-axis label.NULL
legend_titleCharacter string for legend title.NULL
flipLogical parameter indicating whether to flip coordinates.FALSE
valuesLogical parameter indicating whether to display value labels above bars.FALSE
Examples
# Simulated clinical data
clinical_df <- clinical_data()

# Proportion of response by treatment
plot_bar(data = clinical_df, x = "treatment", group = "response", position = "fill", 
         title = "Proportion of response by treatment", values = TRUE)

# Grouped barplot of categorical variable by treatment with value labels
plot_bar(data = clinical_df, x = "response", group = "visit", facet = "treatment", 
         title = "Response by visit and treatment",values = TRUE)

plot_line()

Description

Generates publication-ready line plots with minimal code using ggplot2.

Parameters
ParameterDescriptionDefault
dataA data frame containing the variables to plot.Required
xCharacter string specifying the x-axis variable.Required
yCharacter string specifying the y-axis variable.Required
groupCharacter string specifying the grouping variable for multiple lines.NULL
facetCharacter string specifying the faceting variable.NULL
statCharacter string for statistical aggregation: "mean" or "median".Required
errorCharacter string for error bars: "se", "sd", "ci", or "none"."se"
error_widthNumeric value indicating the width of error bar caps.0.2
colorsCharacter vector of colors. If NULL, uses TealGrn palette.NULL
titleCharacter string for plot title.NULL
xlabCharacter string for x-axis label.NULL
ylabCharacter string for y-axis label.NULL
legend_titleCharacter string for legend title.NULL
pointsLogical parameter indicating whether to add points to lines.TRUE
line_sizeNumeric value indicating thickness of lines.1
point_sizeNumeric value indicating size of points if shown.3
y_limitsNumeric vector of length 2 for y-axis limits.NULL
x_limitsNumeric vector of length 2 for x-axis limits.NULL
Examples
# Line plot with mean and standard error by treatment
plot_line(data = clinical_df_full, x = "visit", y = "biomarker",
          group = "treatment", stat = "mean", error = "se")

# Faceted line plots with median and no error bars
plot_line(data = clinical_df_full, x = "visit", y = "biomarker", group = "treatment", 
          facet = "sex", stat = "median", error = "none", points = FALSE)  

plot_hist()

Description

Generates publication-ready histogram plots with minimal code using ggplot2.

Parameters
ParameterDescriptionDefault
dataA dataframe containing the variables to plot.Required
xCharacter string specifying the variable for the histogram.Required
groupCharacter string specifying the grouping variable for multiple histograms.NULL
facetCharacter string specifying the faceting variable.NULL
binsNumeric value indicating the number of bins for the histogram.30
binwidthNumeric value indicating the width of the bins (overrides bins if specified).NULL
alphaNumeric value indicating the transparency level for the bars.0.7
colorsCharacter vector of colors. If NULL, uses TealGrn palette.NULL
titleCharacter string for plot title.NULL
xlabCharacter string for x-axis label.NULL
ylabCharacter string for y-axis label.NULL
legend_titleCharacter string for legend title.NULL
y_limitsNumeric vector of length 2 for y-axis limits.NULL
x_limitsNumeric vector of length 2 for x-axis limits.NULL
statCharacter string that adds line for "mean" or "median".NULL
Examples
# Mirror histogram for 2 groups with mean lines
plot_hist(clinical_df, x = "biomarker", group = "treatment", stat = "mean")

# Faceted histogram
plot_hist(clinical_df, x = "biomarker", facet = "treatment")

plot_box()

Description

Generates publication-ready boxplots with minimal code using ggplot2.

Parameters
ParameterDescriptionDefault
dataA dataframe containing the variables to plot.Required
xCharacter string specifying the x-axis variable.Required
yCharacter string specifying the y-axis variable.Required
groupCharacter string specifying grouping variable for fill/color.NULL
facetCharacter string specifying faceting variable.NULL
colorsCharacter vector of colors. If NULL, uses TealGrn palette.NULL
titleCharacter string for plot title.NULL
xlabCharacter string for x-axis label.NULL
ylabCharacter string for y-axis label.NULL
legend_titleCharacter string for legend title.NULL
pointsLogical parameter indicating if jittered points should be shown.FALSE
point_sizeNumeric value indicating the size of points.2
y_limitsNumeric vector of length 2 for y-axis limits.NULL
show_meanLogical parameter indicating if mean should be shown.TRUE
Examples
# Boxplot of biomarker by treatment
plot_box(clinical_df, x = "treatment", y = "biomarker", group = "treatment")

# Boxplot of biomarker by study visit and treatment
plot_box(clinical_df, x = "visit", y = "biomarker", group = "treatment")

plot_corr()

Description

Generates publication-ready correlation matrix heatmaps with minimal code using ggplot2.

Parameters
ParameterDescriptionDefault
dataA dataframe containing the variables to analyze.Required
varsCharacter vector specifying which variables to include.NULL
methodCharacter string specifying correlation method: "pearson" or "spearman"."pearson"
typeCharacter string specifying matrix type: "full", "upper", or "lower"."full"
colorsCharacter vector of 3 colors for negative, neutral, and positive correlations.NULL
titleCharacter string for plot title.NULL
show_valuesLogical parameter indicating whether to display correlation values in cells.TRUE
value_sizeNumeric value indicating size of correlation value text.3
show_sigLogical parameter indicating whether to mark significant correlations.FALSE
sig_levelNumeric value indicating significance level for marking.0.05
sig_onlyLogical parameter indicating whether to show only statistically significant values.FALSE
show_legendLogical parameter indicating whether to show legend.TRUE
p_methodCharacter string indicating the method for p-value adjustment in post-hoc multiple comparisons to control for Type I error inflation. Options: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", or "none"."holm"
Examples
# Correlation matrix for base R dataset 'swiss'
plot_corr(data = swiss)

# Lower triangle with significance indicators and filtering
plot_corr(data = swiss, type = "lower", show_sig = TRUE, sig_only = TRUE)

Contributions & Feedback

We welcome feedback, suggestions, and bug reports. You can share your thoughts via email (sebastian.quirarte@sophia.com.mx) or GitHub issues.

Copy Link

Version

Install

install.packages('biostats')

Monthly Downloads

410

Version

1.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sebastian Quirarte-Justo

Last Published

December 16th, 2025

Functions in biostats (1.1.1)

plot_line

Create Simple Professional Line Plots
sample_size

Sample Size Calculation for Clinical Trials
plot_box

Create Simple Professional Box Plots
plot_bar

Create Simple Professional Bar Plots
normality

Statistical and Visual Normality Assessment
plot_corr

Create Simple Professional Correlation Matrix Plots
outliers

Descriptive and Visual Outlier Assessment
effect_measures

Effect Measures
clinical_data

Simulate Simple Clinical Trial Data
omnibus

Omnibus Tests for Comparing Three or More Groups
missing_values

Descriptive and Visual Missing Value Assessment
plot_hist

Create Simple Professional Histogram Plots
sample_size_range

Calculate and visualize sample size across a range of treatment effects
summary_table

Summary Table with Optional Group Comparisons