Learn R Programming

Tplyr

Welcome to Tplyr! Tplyr is a traceability minded grammar of data format and summary. It’s designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result Tplyr produces, it also produces the metadata necessary to give your traceability from source to summary.

As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue through GitHub right here.

Take a look at the cheatsheet!

Installation

You can install Tplyr with:

# Install from CRAN:
install.packages("Tplyr")

# Or install the development version:
devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="devel")

What is Tplyr?

dplyr from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. dplyr conceptually breaks things down into verbs that allow you to focus on what you want to do more than how you have to do it.

Tplyr is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:

  • Counting for event based variables or categories
  • Shifting, which is just counting a change in state with a ‘from’ and a ‘to’
  • Generating descriptive statistics around some continuous variable.

For many of the tables that go into a clinical submission, the tables are made up of a combination of these approaches. Consider a demographics table - and let’s use an example from the PHUSE project Standard Analyses & Code Sharing - Analyses & Displays Associated with Demographics, Disposition, and Medications in Phase 2-4 Clinical Trials and Integrated Summary Documents.

When you look at this table, you can begin breaking this output down into smaller, redundant, components. These components can be viewed as ‘layers’, and the table as a whole is constructed by stacking the layers. The boxes in the image above represent how you can begin to conceptualize this.

  • First we have Sex, which is made up of n (%) counts.
  • Next we have Age as a continuous variable, where we have a number of descriptive statistics, including n, mean, standard deviation, median, quartile 1, quartile 3, min, max, and missing values.
  • After that we have age, but broken into categories - so this is once again n (%) values.
  • Race - more counting,
  • Ethnicity - more counting
  • Weight - and we’re back to descriptive statistics.

So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed. In the same way that dplyr is a grammar of data manipulation, Tplyr aims to be a grammar of data summary. The goal of Tplyr is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller ‘layers’, and combining them together like you see on the page.

Enough talking - let’s see some code. In these examples, we will be using data from the PHUSE Test Data Factory based on the original pilot project submission package. We’ve packaged some subsets of that data into Tplyr, which you can use to replicate our examples and run our vignette code yourself. Note: You can see our replication of the CDISC pilot using the PHUSE Test Data Factory data here.

tplyr_table(tplyr_adsl, TRT01P, where = SAFFL == "Y") %>% 
  add_layer(
    group_desc(AGE, by = "Age (years)")
  ) %>% 
  add_layer(
    group_count(AGEGR1, by = "Age Categories n (%)")
  ) %>% 
  build() %>% 
  kable()
row_label1row_label2var1_Placebovar1_Xanomeline High Dosevar1_Xanomeline Low Doseord_layer_indexord_layer_1ord_layer_2
Age (years)n868484111
Age (years)Mean (SD)75.2 ( 8.59)74.4 ( 7.89)75.7 ( 8.29)112
Age (years)Median76.076.077.5113
Age (years)Q1, Q369.2, 81.870.8, 80.071.0, 82.0114
Age (years)Min, Max52, 8956, 8851, 88115
Age (years)Missing000116
Age Categories n (%)<6514 ( 16.3%)11 ( 13.1%)8 ( 9.5%)211
Age Categories n (%)>8030 ( 34.9%)18 ( 21.4%)29 ( 34.5%)212
Age Categories n (%)65-8042 ( 48.8%)55 ( 65.5%)47 ( 56.0%)213

Tplyr is Qualified

We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing Tplyr includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.

You can find the qualification document within this repository right here. The ‘uat’ folder additionally contains all of the raw files, programmatic tests, specifications, and test cases necessary to create this report.

The TL;DR

Here are some of the high level benefits of using Tplyr:

  • Easy construction of table data using an intuitive syntax
  • Smart string formatting for your numbers that’s easily specified by the user
  • A great deal of flexibility in what is performed and how it’s presented, without specifying hundreds of parameters

Where to go from here?

There’s quite a bit more to learn! And we’ve prepared a number of other vignettes to help you get what you need out of Tplyr.

  • The best place to start is with our Getting Started vignette at vignette("Tplyr")
  • Learn more about table level settings in vignette("table")
  • Learn more about descriptive statistics layers in vignette("desc")
  • Learn more about count layers in vignette("count")
  • Learn more about shift layers in vignette("shift")
  • Learn more about percentages in vignette("denom")
  • Learn more about calculating risk differences in vignette("riskdiff")
  • Learn more about sorting Tplyr tables in vignette("sort")
  • Learn more about using Tplyr options in vignette("options")
  • And finally, learn more about producing and outputting styled tables using Tplyr in vignette("styled-table")

In the Tplyr version 1.0.0, we’ve packed a number of new features in. For deeper dives on the largest new additions:

  • Learn about Tplyr’s traceability metadata in vignette("metadata") and about how it can be extended in vignette("custom-metadata")
  • Learn about layer templates in vignette("layer_templates")

References

In building Tplyr, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.

Analysis and Displays Associated with Adverse Events

Analyses and Displays Associated with Demographics, Disposition, and Medications

Analyses and Displays Associated with Measures of Central Tendency

Copy Link

Version

Install

install.packages('Tplyr')

Monthly Downloads

578

Version

1.2.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Michael Stackhouse

Last Published

February 20th, 2024

Functions in Tplyr (1.2.1)

get_data_labels

Get Data Labels
f_str

Create a f_str object
add_layer

Attach a layer to a tplyr_table object
keep_levels

Select levels to keep in a count layer
get_meta_result

Extract the result metadata of a Tplyr table
%>%

Pipe operator
header_n

Return or set header_n binding
get_tplyr_regex

Retrieve one of Tplyr's regular expressions
get_metadata

Get the metadata dataframe from a tplyr_table
add_variables

Add variables to a tplyr_meta object
process_statistic_data

Process a tplyr_statistic object
pop_data

Return or set population data bindings
pop_treat_var

Return or set pop_treat_var binding
get_stats_data

Get statistics data
new_layer_template

Create, view, extract, remove, and use Tplyr layer templates
group_count

Create a count, desc, or shift layer for discrete count based summaries, descriptive statistics summaries, or shift count summaries
process_metadata

Process layers to get metadata tables
set_order_count_method

Set the ordering logic for the count layer
set_custom_summaries

Set custom summaries to be performed within a descriptive statistics layer
replace_leading_whitespace

Reformat strings with leading whitespace for HTML
get_desc_layer_formats

Get or set the default format strings for descriptive statistics layers
str_indent_wrap

Wrap strings to a specific width with hyphenation while preserving indentation
process_formatting

Process layers to get formatted and pivoted tables.
get_precision_on

Set or return precision_on layer binding
get_precision_by

Set or return precision_by layer binding
set_limit_data_by

Set variables to limit reported data values only to those that exist rather than fully completing all possible levels
set_missing_subjects_row_label

Set the label for the missing subjects row
set_denom_where

Set Logic for denominator subsetting
set_denom_ignore

Set values the denominator calculation will ignore
tplyr_table

Create a Tplyr table object
add_treat_grps

Combine existing treatment groups for summary
get_target_var

Set or return treat_var binding
set_indentation

Set the option to prefix the row_labels in the inner count_layer
set_format_strings

Set the format strings and associated summaries to be performed in a layer
tplyr_adae

ADAE Data
tplyr_layer

Create a tplyr_layer object
tplyr_meta

Tplyr Metadata Object
set_missing_count

Set the display for missing strings
process_statistic_formatting

Process string formatting on a tplyr_statistic object
set_precision_data

Set precision data
set_nest_count

Set the option to nest count layers
tplyr_adlb

ADLB Data
tplyr_adas

ADAS Data
tplyr_adsl

ADSL Data
set_stats_as_columns

Set descriptive statistics as columns
tplyr_adpe

ADPE Data
set_outer_sort_position

Set the value of a outer nested count layer to Inf or -Inf
set_denoms_by

Set variables used in pct denominator calculation
set_numeric_threshold

Set a numeric cutoff
process_summaries

Process layers to get numeric results of layer
set_total_row_label

Set the label for the total row
treat_var

Return or set the treatment variable binding
set_distinct_by

Set counts to be distinct by some grouping variable.
str_extract_fmt_group

Extract format group strings or numbers
get_where.tplyr_layer

Set or return where binding for layer or table
apply_row_masks

Replace repeating row label variables with blanks in preparation for display.
apply_formats

Apply Format Strings outside of a Tplyr table
add_anti_join

Add an anti-join onto a tplyr_meta object
add_risk_diff

Add risk difference to a count layer
add_column_headers

Attach column headers to a Tplyr output
append_metadata

Append the Tplyr table metadata dataframe
add_total_row

Add a Total row into a count summary.
add_missing_subjects_row

Add a missing subject row into a count summary.
apply_conditional_format

Conditional reformatting of a pre-populated string of numbers
Tplyr

A grammar of summary data for clinical reports
get_meta_subset

Extract the subset of data based on result metadata
get_numeric_data

Retrieve the numeric data from a tplyr objects
build

Trigger the execution of the tplyr_table
get_by

Set or return by layer binding
collapse_row_labels

Collapse row labels into a single column