Learn R Programming

rEHR (version 1.0)

select_by_year: Runs a series of selects over a year range and collects in a list of dataframes

Description

This function applies a database select over a range of years and outputs as a list or a dataframe The function can be parallelised using parallel.

Usage

select_by_year(dbname = NULL, db = NULL, tables, columns = "*", where, year_range, year_fn = qof_years, as_list = FALSE, selector_fn = select_events, cores = 1L, ...)

Arguments

dbname
path to the database file
db
a database connection
tables
character vector of table names
columns
character vector of columns to be selected from the tables
where
character string representation of the selection criteria
year_range
integer vector of years to be queried
year_fn
function that determines how year start and end dates are calculated
as_list
logical: Should the results be returned as a list of years? If not, the data is collapsed into a dataframe
selector_fn
function to select from the database. See notes.
cores
integer: The number of processor cores available.
...
extra arguments to be passed to the selector_fn

Details

Because the same database connection cannot be used across threads, the input is a path to a database rather than a database connection itself and a new connection is made with every fork.

columns can take a character vector of arbitrary length. This means you can use it to insert SQL clauses e.g. "DISTINCT patid".

Year start and year end criteria can be added to the where argument as 'STARTDATE' and 'ENDDATE'. These will get translated to the correct start and end dates specified by year_fn

Note that if you are working with temprary tables, you need to set cores to 1 and specify the open database connection with db This is because the use of mclapply means that new database connections need to be started for each fork and temporary files can only be seen inside the same connection

The selector_fn argument determines how the database select operates. Default is the select_events function. Alternatives are first_events and last_events

Examples

Run this code
## Not run: 
# # Output from a single table
# where_q <- "crd < STARTDATE & (is.null(tod) | tod > ENDDATE) & accept == 1"
# ayears <- select_by_year(db, "Patient", columns = c("patid", "yob", "tod"), 
#                          where = where_q, year_range = 2000:2003)
# # Output from multiple tables
# load("data/medical.RData")
# a <- read.csv("data/chronic-renal-disease.csv")
# a <- read_to_medcodes(a, medical, "code", lookup_readcodes= "readcode", 
#                       lookup_medcodes="medcode", description = T)
# where_q <- "eventdate >= STARTDATE & eventdate <= ENDDATE & medcode %in% .(a$medcode)"
# byears <- byears <- select_by_year("~/rOpenHealth/CPRD_test/Coupland/Coupland", 
#                                    c("Clinical", "Referral"), 
# columns = c("patid", "eventdate", "medcode"), 
# where = where_q, year_range = 2000:2003, as_list = FALSE,
# cores = 10)
# ## End(Not run)

Run the code above in your browser using DataLab