Learn R Programming

rEHR (version 1.0)

match_on_index: Function for performing matching of controls to cases using the consultation files to generate a dummy index date for controls.

Description

Controls are matched on an arbitrary number of categrorical variables and on continuous variables via the extra_conditions argument. Also the date at index_var is matched to the eventdate in the consultation files, providing a dummy index date for controls of a consultaton within +/- index_diff_limit days of the index date. Note that the consultaton files must be in flat-file format (i.e. not as part of the database, but as text (or other filetype, e.g stata dta) files). Set the import_fn argument to use different file formats (e.g. foreign::read.dta or readstata13::read.dta13) The extra_conditions argument can add extra condtions to the matching criteria on top of the matching vars for example you could add "year > 1990". You can wrap calls to expressions in dotted brackets to automatically expand them. This is particularly useful when you want to find the value for each individual case. Each case is denoted by CASE e.g. "start_date < .(CASE$start_date)" will ensure the start date for controls is prior to the start date for the matched case.

Usage

match_on_index(cases, control_pool, index_var, match_vars, extra_conditions = "", index_diff_limit = 90, consult_path, n_controls = 5, cores = 1, import_fn = read.delim, ...)

Arguments

cases
A dataframe of cases to which to match controls
control_pool
A dataframe of possible contols to match to cases
index_var
character string of the name of the variable containing index dates
match_vars
character vector detailing the common variables in cases and control_pool to match on
extra_conditions
character string detailing other matching constraints (see details)
index_diff_limit
integer number of days before or after the case index date that dummy index dates can be picked from the consultation files
consult_path
path to directory containing consultation files
n_controls
integer the number of controls to attempt to match to each case
cores
integer the number of processor cores to be used in processing
import_fn
function name stipulating the function used to read the consultation files
...
extra arguments to be passed to import_fn

Value

a dataframe of matched controls