This material does not describe a function, but (rather) the SQRL script file format for SQL with embedded R.
For instructions on how to submit (run) these scripts from file, refer to
sqrlUsage
.
The following (very simple) example scripts won't necessarily work with your own version of SQL or your own data source.
-- My file select 1; select 2; select 3;
Multi-statement SQL files can be copied directly from “SQL Developer” (or similar application). Each of the (above) three queries will be submitted in turn. Only the final result will be returned to the R calling environment (in this case, a data frame containing a single value of 3).
-- This is SQL select <R> # This is R if (exists("x")) { x } else { 0 } </R> /* This is SQL */ from dual
SQRL supports the embedding of R within SQL, via
XML-style tags (<R> ... </R>
), as above.
This enables (explicitly and/or implicitly) parameterised SQL queries.
In this example (above), if the variable x was explicitly passed to the
query, then the supplied value will be used (see sqrlUsage
).
If x was not explicitly passed, then it can be inherited (implicitly
passed) from the calling environment.
In this case, if x was not explicitly supplied and also does not x
exist within the calling environment, then a default value of 0 will be used.
SQL comments are allowed within R (<R> ... </R>
) sections, so
that SQL syntax highlighting can be better applied to the file.
R comments are not allowed within SQL sections
Nested SQL queries can be made from within <R> ... </R>
tags.
However, <R> ... </R>
tags cannot be nested.
-- SQL statement, ending on a semi-colon. use mydatabase;/* SQL statement, ending on a <do> tag. */ create table mynewtable as select columnA, columnB from originaltable <do>
-- SQL query, ending on a <result> (assignment) tag. -- In this case, the result of the query (a data frame) -- is assigned to an R variable 'a' (within a temporary -- working environment communal to this script). select max(columnA) from mynewtable <result -> a>
# Manipulate the result in R, then # proceed to the next SQL statement. b <- runif(1) a <- as.numeric(a) + b <do>
-- Use the earlier result in a new query. select columnA, columnB from mynewtable where columnA > <R> a * b </R> and columnB < <R> z <- 5; z + 2 </R> <result -> x>
-- Only the result of the final operation -- is returned at the end of the script. return(list(minA = min(x$columnA), maxB = max(x$columnB))
SQL statements are terminated by any one of a semicolon, a <do>
tag, a <result>
tag, or the end of the file.
After a <result>
tag, statements are interpreted as R (as opposed to
SQL) until either a <do>
tag, or the end of the file, is
encountered.
Following a <do>
tag, statements are interpreted as SQL.
In the case of a <result -> var>
tag, the SQL query result is
assigned to an R variable var within the script's temporary working
environment.
This variable can be arbitrarily manipulated in R, and the result of that
manipulation used as part of a subsequent SQL query.
With the exception of the name of the target (assignment) variable within a
result
tag, tags are not case sensitive.
When the name of the target variable is either “null” or “NULL”,
then the query result is not assigned into the working environment (as with a
<do>
tag), but the subsequent script is still interpreted as R (unlike
with a <do>
tag).
After the end of the script, the working environment is lost, along with any variables within it. Only the final result is returned to the calling environment.
-- R sections can be used to set temporary variables. <R> columns <- list("columnA", "columnB") wordlist <- list("'red'", "'blue'") conditions <- c("and columnA < 2", "and columnD > 4") <do>-- Now the query. select <R> columns </R> from dbname.tablename where columnC in (<R> wordlist </R>) <R> conditions </R>
-- End the script here. -- The statements below are ignored. <stop> and columnA > 0
Whereas <R> ... </R>
sections are embedded within SQL (which
may be blank), <R> ... <do>
sections exist outside of any SQL.
While the result of an <R> ... </R>
section is substituted back into the
surrounding SQL (and will form part of the query), an
<R> ... <do>
section is simply evaluated within the script's working
environment.
Although a <result -> null> ... <do>
section may be preceded by
SQL (which will be submitted prior to evaluating the section), an
<R> ... <do>
section may not be (the two sections being otherwise
equivalent).
Lists are inserted comma collapsed. Vectors are inserted newline collapsed.
A <stop>
tag imitates the end of the file, and can be used to interrupt a
script for debugging.
# NOT RUN {
# Define a new data source.
sqrlSource("mire", "dsn=Mire")
# }
# NOT RUN {
# Submit a SQL script to the source.
mire("my/script.sql")
# Submit a SQRL script to the source.
mire("my/script.sqrl")
# Submit a SQRL script, with explicit parameter values.
mire("my/parameterised/script.sqrl",
day = Sys.Date(), colour = "'blue'")
# }
Run the code above in your browser using DataLab