Learn R Programming

rEHR (version 1.0)

import_CPRD_data: Imports all selected CPRD data into an sqlite database

Description

This function can import from both cohorts downloaded via the CPRD online tool and CPRD GOLD builds

Usage

import_CPRD_data(db, data_dir, filetypes = c("Additional", "Clinical", "Consultation", "Immunisation", "Patient", "Practice", "Referral", "Staff", "Test", "Therapy"), dateformat = "%d/%m/%Y", yob_origin = 1800, regex = "PET", recursive = TRUE, ...)

Arguments

db
a database connection
data_dir
the directory containing the CPRD cohort data
filetypes
character vector of filetypes to be imported
dateformat
the format that dates are stored in the CPRD data. If this is wrong it won't break but all dates are likely to be NA
yob_origin
value to add yob values to to get actual year of birth (Generally 1800)
regex
character regular expression to identify data files in the directory. This is separated from the filetype by an underscore. e.g. 'p[0-9]3' in CPRD GOLD
recursive
logical should files be searched for recursively under the data_dir?
...
arguments to be passed to add_to_database

Details

Note that if you chose to import all the filetype, you may end up with a very large database file. You may then chose only to import the files you want to use. You can always import the rest of the files later. This function may take a long time to process because it unzips (potentially large) files, reads into R where it converts the date formats before importing to SQLite. However, this initial data preparation step will greatly accelerate downstream processing.