It is not uncommon to have an analysis need whereby one needs to derive an
analysis value (AVAL
) from multiple records. The ADaM basic dataset
structure variable DTYPE
is available to indicate when a new derived
records has been added to a dataset, if the derivation deviates from the
standard derivation of the parameter.
derive_summary_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter_add = NULL,
set_values_to,
missing_values = NULL
)
A data frame with derived records appended to original dataset.
Input dataset
If the argument is not specified (or set to NULL
), a new dataset is
created. Otherwise, the new records are appended to the specified dataset.
a dataset, i.e., a data.frame
or tibble
NULL
Additional dataset
The variables specified for by_vars
are expected.
Observations from the specified dataset are going to be used to calculate and added
as new records to the input dataset (dataset
).
a dataset, i.e., a data.frame
or tibble
none
Reference dataset
The variables specified for by_vars
are expected. For each
observation of the specified dataset a new observation is added to the
input dataset.
a dataset, i.e., a data.frame
or tibble
NULL
Grouping variables
Variables to consider for generation of groupwise summary
records. Providing the names of variables in exprs()
will create a
groupwise summary and generate summary records for the specified groups.
list of variables created by exprs()
, e.g., exprs(USUBJID, VISIT)
none
Filter condition as logical expression to apply during
summary calculation. By default, filtering expressions are computed within
by_vars
as this will help when an aggregating, lagging, or ranking
function is involved.
For example,
filter_add = (AVAL > mean(AVAL, na.rm = TRUE))
will filter all AVAL
values greater than mean of AVAL
with in by_vars
.
filter_add = (dplyr::n() > 2)
will filter n count of by_vars
greater
than 2.
an unquoted condition, e.g., AVISIT == "BASELINE"
NULL
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a
symbol, a numeric value, an expression or NA. If summary functions are
used, the values are summarized by the variables specified for by_vars
.
For example:
set_values_to = exprs(
AVAL = sum(AVAL),
DTYPE = "AVERAGE",
)
list of named expressions created by exprs()
, e.g., exprs(CUMDOSA = sum(AVAL, na.rm = TRUE), AVALU = "ml")
none
Values for missing summary values
For observations of the reference dataset (dataset_ref
) which do not have a
complete mapping defined by the summarization defined in set_values_to
. Only variables
specified for set_values_to
can be specified for missing_values
.
list of named expressions created by exprs()
, e.g., exprs(CUMDOSA = sum(AVAL, na.rm = TRUE), AVALU = "ml")
NULL
The following examples use the ECG dataset below as a basis.
library(tibble, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)adeg <- tribble(
~USUBJID, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL,
"XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385,
"XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399,
"XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396,
"XYZ-1001", "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393,
"XYZ-1001", "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388,
"XYZ-1001", "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394,
"XYZ-1001", "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402,
"XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399,
"XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 200,
"XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392,
"XYZ-1002", "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414,
"XYZ-1002", "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402
) %>%
mutate(ADTM = convert_dtc_to_dtm(EGDTC))
A derived record is generated for each subject, containing the mean of the triplicate ECG
interval values (AVAL
) and the latest measurement's time (ADTM
) by using summary functions
within the set_values_to
argument.
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
ADTM = max(ADTM),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 17 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA>
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA>
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA>
#> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. 2016-02-24 07:56:00 AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA>
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA>
#> 7 XYZ-1001 QTcF Int. (msec) Visit 2 <NA> 390. 2016-03-08 09:51:00 AVER…
#> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA>
#> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA>
#> 10 XYZ-1001 QTcF Int. (msec) Visit 3 <NA> 398 2016-03-22 10:51:00 AVER…
#> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA>
#> 12 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA>
#> 13 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA>
#> 14 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. 2016-02-22 08:01:00 AVER…
#> 15 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA>
#> 16 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
#> 17 XYZ-1002 QTcF Int. (msec) Visit 3 <NA> 408 2016-03-24 10:56:00 AVER…
Functions such as all()
and any()
are also often useful when creating
summary records. For instance, the above example can be extended to flag which derived
records were affected by outliers. Note that the outlier flag is created before AVAL
is set for the summary record. Otherwise, referencing AVAL
later on would pick up the
AVAL
from the summary record rather than the source records.
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
OUTLIEFL = if_else(any(AVAL >= 500 | AVAL <= 300), "Y", "N"),
AVAL = mean(AVAL, na.rm = TRUE),
ADTM = max(ADTM),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 17 × 8
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM OUTLIEFL DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> <chr>
#> 1 XYZ-1001 QTcF Int. (ms… Basel… 2016… 385 2016-02-24 07:50:00 <NA> <NA>
#> 2 XYZ-1001 QTcF Int. (ms… Basel… 2016… 399 2016-02-24 07:52:00 <NA> <NA>
#> 3 XYZ-1001 QTcF Int. (ms… Basel… 2016… 396 2016-02-24 07:56:00 <NA> <NA>
#> 4 XYZ-1001 QTcF Int. (ms… Basel… <NA> 393. 2016-02-24 07:56:00 N AVER…
#> 5 XYZ-1001 QTcF Int. (ms… Visit… 2016… 393 2016-03-08 09:48:00 <NA> <NA>
#> 6 XYZ-1001 QTcF Int. (ms… Visit… 2016… 388 2016-03-08 09:51:00 <NA> <NA>
#> 7 XYZ-1001 QTcF Int. (ms… Visit… <NA> 390. 2016-03-08 09:51:00 N AVER…
#> 8 XYZ-1001 QTcF Int. (ms… Visit… 2016… 394 2016-03-22 10:48:00 <NA> <NA>
#> 9 XYZ-1001 QTcF Int. (ms… Visit… 2016… 402 2016-03-22 10:51:00 <NA> <NA>
#> 10 XYZ-1001 QTcF Int. (ms… Visit… <NA> 398 2016-03-22 10:51:00 N AVER…
#> 11 XYZ-1002 QTcF Int. (ms… Basel… 2016… 399 2016-02-22 07:58:00 <NA> <NA>
#> 12 XYZ-1002 QTcF Int. (ms… Basel… 2016… 200 2016-02-22 07:58:00 <NA> <NA>
#> 13 XYZ-1002 QTcF Int. (ms… Basel… 2016… 392 2016-02-22 08:01:00 <NA> <NA>
#> 14 XYZ-1002 QTcF Int. (ms… Basel… <NA> 330. 2016-02-22 08:01:00 Y AVER…
#> 15 XYZ-1002 QTcF Int. (ms… Visit… 2016… 414 2016-03-24 10:53:00 <NA> <NA>
#> 16 XYZ-1002 QTcF Int. (ms… Visit… 2016… 402 2016-03-24 10:56:00 <NA> <NA>
#> 17 XYZ-1002 QTcF Int. (ms… Visit… <NA> 408 2016-03-24 10:56:00 N AVER…
filter_add
)
The filter_add
argument can be used to restrict the records that are being summarized.
For instance, the mean of the triplicates above can be computed only for the baseline records
by passing filter_add = AVISIT == "Baseline"
.
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
filter_add = AVISIT == "Baseline",
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 14 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA>
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA>
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA>
#> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. NA AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA>
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA>
#> 7 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA>
#> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA>
#> 9 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA>
#> 10 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA>
#> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA>
#> 12 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. NA AVER…
#> 13 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA>
#> 14 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
Summary functions can also be used within filter_add
to filter based on conditions
applied to the whole of the by group specified in by_vars
. For instance, the mean of
the triplicates can be computed only for by groups which do indeed contain three records
by passing filter_add = n() > 2
.
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
filter_add = n() > 2,
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 14 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA>
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA>
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA>
#> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. NA AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA>
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA>
#> 7 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA>
#> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA>
#> 9 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA>
#> 10 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA>
#> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA>
#> 12 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. NA AVER…
#> 13 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA>
#> 14 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
dataset_ref
and missing_values
)
Adding records for groups which are not in the source data can be achieved by
specifying a reference dataset in the dataset_ref
argument. For example, specifying
the input dataset adeg_allparamvis
(containing an extra "Visit 2"
for patient
1002
) ensures a summary record is derived for that visit as well. For these records,
the values of the analysis variables to be populated should be specified within the
missing_values
argument. Here, DTYPE = "PHANTOM"
was chosen as AVAL
is set to
missing.
adeg_allparamvis <- tribble(
~USUBJID, ~PARAM, ~AVISIT,
"XYZ-1001", "QTcF Int. (msec)", "Baseline",
"XYZ-1001", "QTcF Int. (msec)", "Visit 2",
"XYZ-1001", "QTcF Int. (msec)", "Visit 3",
"XYZ-1002", "QTcF Int. (msec)", "Baseline",
"XYZ-1002", "QTcF Int. (msec)", "Visit 2",
"XYZ-1002", "QTcF Int. (msec)", "Visit 3"
)derive_summary_records(
adeg,
dataset_add = adeg,
dataset_ref = adeg_allparamvis,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
ADTM = max(ADTM),
DTYPE = "AVERAGE"
),
missing_values = exprs(
AVAL = NA,
ADTM = NA,
DTYPE = "PHANTOM"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 18 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA>
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA>
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA>
#> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. 2016-02-24 07:56:00 AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA>
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA>
#> 7 XYZ-1001 QTcF Int. (msec) Visit 2 <NA> 390. 2016-03-08 09:51:00 AVER…
#> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA>
#> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA>
#> 10 XYZ-1001 QTcF Int. (msec) Visit 3 <NA> 398 2016-03-22 10:51:00 AVER…
#> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA>
#> 12 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA>
#> 13 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA>
#> 14 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. 2016-02-22 08:01:00 AVER…
#> 15 XYZ-1002 QTcF Int. (msec) Visit 2 <NA> NA NA PHAN…
#> 16 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA>
#> 17 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
#> 18 XYZ-1002 QTcF Int. (msec) Visit 3 <NA> 408 2016-03-24 10:56:00 AVER…
For the newly derived records, only variables specified within by_vars
or
set_values_to
will be populated. All other variables will be set to NA
.
derive_var_merged_summary()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_event()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()