Last chance! 50% off unlimited learning
Sale ends in
Add new variable(s) to the input dataset based on variables from another
dataset. The observations to merge can be selected by a condition
(filter_add
argument) and/or selecting the first or last observation for
each by group (order
and mode
argument).
derive_vars_merged(
dataset,
dataset_add,
by_vars,
order = NULL,
new_vars = NULL,
filter_add = NULL,
mode = NULL,
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
missing_values = NULL,
check_type = "warning",
duplicate_msg = NULL,
relationship = NULL
)
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
from
the additional dataset (dataset_add
).
Input dataset
The variables specified by the by_vars
argument are expected to be in the dataset.
Additional dataset
The variables specified by the by_vars
, the new_vars
, and the order
argument are expected.
Grouping variables
The input dataset and the selected observations from the additional dataset are merged by the specified variables.
Variables can be renamed by naming the element, i.e.
by_vars = exprs(<name in input dataset> = <name in additional dataset>)
, similar to the dplyr
joins.
Permitted Values: list of variables created by exprs()
e.g. exprs(USUBJID, VISIT)
Sort order
If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.
Variables defined by the new_vars
argument can be used in the sort order.
For handling of NA
s in sorting variables see Sort Order.
Permitted Values: list of expressions created by exprs()
, e.g.,
exprs(ADT, desc(AVAL))
or NULL
Variables to add
The specified variables from the additional dataset are added to the output
dataset. Variables can be renamed by naming the element, i.e., new_vars = exprs(<new name> = <old name>)
.
For example new_vars = exprs(var1, var2)
adds variables var1
and var2
from dataset_add
to the input dataset.
And new_vars = exprs(var1, new_var2 = old_var2)
takes var1
and
old_var2
from dataset_add
and adds them to the input dataset renaming
old_var2
to new_var2
.
Values of the added variables can be modified by specifying an expression.
For example, new_vars = LASTRSP = exprs(str_to_upper(AVALC))
adds the
variable LASTRSP
to the dataset and sets it to the upper case value of
AVALC
.
If the argument is not specified or set to NULL
, all variables from the
additional dataset (dataset_add
) are added.
Permitted Values: list of variables or named expressions created by exprs()
Filter for additional dataset (dataset_add
)
Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.
Variables defined by the new_vars
argument can be used in the filter
condition.
Permitted Values: a condition
Selection mode
Determines if the first or last observation is selected. If the order
argument is specified, mode
must be non-null.
If the order
argument is not specified, the mode
argument is ignored.
Permitted Values: "first"
, "last"
, NULL
Exist flag
If the argument is specified (e.g., exist_flag = FLAG
), the specified
variable (e.g., FLAG
) is added to the input dataset. This variable will
be the value provided in true_value
for all selected records from dataset_add
which are merged into the input dataset, and the value provided in false_value
otherwise.
Permitted Values: Variable name
True value
The value for the specified variable exist_flag
, applicable to
the first or last observation (depending on the mode) of each by group.
Permitted Values: An atomic scalar
False value
The value for the specified variable exist_flag
, NOT applicable to
the first or last observation (depending on the mode) of each by group.
Permitted Values: An atomic scalar
Values for non-matching observations
For observations of the input dataset (dataset
) which do not have a
matching observation in the additional dataset (dataset_add
) the values
of the specified variables are set to the specified value. Only variables
specified for new_vars
can be specified for missing_values
.
Permitted Values: named list of expressions, e.g.,
exprs(BASEC = "MISSING", BASE = -1)
Check uniqueness?
If "warning"
, "message"
, or "error"
is specified, the specified message is issued
if the observations of the (restricted) additional dataset are not unique
with respect to the by variables and the order.
If the order
argument is not specified, the check_type
argument is ignored:
if the observations of the (restricted) additional dataset are not unique with respect
to the by variables, an error is issued.
Permitted Values: "none"
, "message"
, "warning"
, "error"
Message of unique check
If the uniqueness check fails, the specified message is displayed.
Default:
paste(
"Dataset {.arg dataset_add} contains duplicate records with respect to",
"{.var {vars2chr(by_vars)}}."
)
Expected merge-relationship between the by_vars
variable(s) in dataset
(input dataset) and the dataset_add
(additional dataset)
containing the additional new_vars
.
This argument is passed to the dplyr::left_join()
function. See
https://dplyr.tidyverse.org/reference/mutate-joins.html#arguments for
more details.
Permitted Values: "one-to-one"
, "many-to-one"
, NULL
.
The new variables (new_vars
) are added to the additional dataset
(dataset_add
).
The records from the additional dataset (dataset_add
) are restricted
to those matching the filter_add
condition.
If order
is specified, for each by group the first or last observation
(depending on mode
) is selected.
The variables specified for new_vars
are merged to the input dataset
using left_join()
. I.e., the output dataset contains all observations
from the input dataset. For observations without a matching observation in
the additional dataset the new variables are set as specified by
missing_values
(or to NA
for variables not in missing_values
).
Observations in the additional dataset which have no matching observation
in the input dataset are ignored.
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_cat()
,
derive_vars_computed()
,
derive_vars_joined()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
library(dplyr, warn.conflicts = FALSE)
vs <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~VSTESTCD, ~VISIT, ~VSSTRESN, ~VSSTRESU, ~VSDTC,
"PILOT01", "VS", "01-1302", "HEIGHT", "SCREENING", 177.8, "cm", "2013-08-20",
"PILOT01", "VS", "01-1302", "WEIGHT", "SCREENING", 81.19, "kg", "2013-08-20",
"PILOT01", "VS", "01-1302", "WEIGHT", "BASELINE", 82.1, "kg", "2013-08-29",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 2", 81.19, "kg", "2013-09-15",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 4", 82.56, "kg", "2013-09-24",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 6", 80.74, "kg", "2013-10-08",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 8", 82.1, "kg", "2013-10-22",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 12", 82.1, "kg", "2013-11-05",
"PILOT01", "VS", "17-1344", "HEIGHT", "SCREENING", 163.5, "cm", "2014-01-01",
"PILOT01", "VS", "17-1344", "WEIGHT", "SCREENING", 58.06, "kg", "2014-01-01",
"PILOT01", "VS", "17-1344", "WEIGHT", "BASELINE", 58.06, "kg", "2014-01-11",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 2", 58.97, "kg", "2014-01-24",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 4", 57.97, "kg", "2014-02-07",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 6", 58.97, "kg", "2014-02-19",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 8", 57.79, "kg", "2014-03-14"
)
dm <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~AGE, ~AGEU,
"PILOT01", "DM", "01-1302", 61, "YEARS",
"PILOT01", "DM", "17-1344", 64, "YEARS"
)
# Merging all dm variables to vs
derive_vars_merged(
vs,
dataset_add = select(dm, -DOMAIN),
by_vars = exprs(STUDYID, USUBJID)
) %>%
select(STUDYID, USUBJID, VSTESTCD, VISIT, VSSTRESN, AGE, AGEU)
# Merge last weight to adsl
adsl <- tribble(
~STUDYID, ~USUBJID, ~AGE, ~AGEU,
"PILOT01", "01-1302", 61, "YEARS",
"PILOT01", "17-1344", 64, "YEARS"
)
derive_vars_merged(
adsl,
dataset_add = vs,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(convert_dtc_to_dtm(VSDTC)),
mode = "last",
new_vars = exprs(LASTWGT = VSSTRESN, LASTWGTU = VSSTRESU),
filter_add = VSTESTCD == "WEIGHT",
exist_flag = vsdatafl
) %>%
select(STUDYID, USUBJID, AGE, AGEU, LASTWGT, LASTWGTU, vsdatafl)
# Derive treatment start datetime (TRTSDTM)
ex <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~EXSTDY, ~EXENDY, ~EXSTDTC, ~EXENDTC,
"PILOT01", "EX", "01-1302", 1, 18, "2013-08-29", "2013-09-15",
"PILOT01", "EX", "01-1302", 19, 69, "2013-09-16", "2013-11-05",
"PILOT01", "EX", "17-1344", 1, 14, "2014-01-11", "2014-01-24",
"PILOT01", "EX", "17-1344", 15, 63, "2014-01-25", "2014-03-14"
)
## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
ex,
dtc = EXSTDTC,
new_vars_prefix = "EXST",
highest_imputation = "M",
)
## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
select(dm, STUDYID, USUBJID),
dataset_add = ex_ext,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
order = exprs(EXSTDTM),
mode = "first"
)
# Derive treatment end datetime (TRTEDTM)
## Impute exposure end datetime to last time, no date imputation
ex_ext <- derive_vars_dtm(
ex,
dtc = EXENDTC,
new_vars_prefix = "EXEN",
time_imputation = "last",
)
## Add last exposure datetime and imputation flag to adsl
derive_vars_merged(
select(adsl, STUDYID, USUBJID),
dataset_add = ex_ext,
filter_add = !is.na(EXENDTM),
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
order = exprs(EXENDTM),
mode = "last"
)
# Modify merged values and set value for non matching observations
adsl <- tribble(
~USUBJID, ~SEX, ~COUNTRY,
"ST42-1", "F", "AUT",
"ST42-2", "M", "MWI",
"ST42-3", "M", "NOR",
"ST42-4", "F", "UGA"
)
advs <- tribble(
~USUBJID, ~PARAMCD, ~AVISIT, ~AVISITN, ~AVAL,
"ST42-1", "WEIGHT", "BASELINE", 0, 66,
"ST42-1", "WEIGHT", "WEEK 2", 1, 68,
"ST42-2", "WEIGHT", "BASELINE", 0, 88,
"ST42-3", "WEIGHT", "WEEK 2", 1, 55,
"ST42-3", "WEIGHT", "WEEK 4", 2, 50
)
derive_vars_merged(
adsl,
dataset_add = advs,
by_vars = exprs(USUBJID),
new_vars = exprs(
LSTVSCAT = if_else(AVISIT == "BASELINE", "BASELINE", "POST-BASELINE")
),
order = exprs(AVISITN),
mode = "last",
missing_values = exprs(LSTVSCAT = "MISSING")
)
Run the code above in your browser using DataLab