This material does not describe a function, but (rather) the SQRL script file format for SQL with embedded R.
For instructions on how to submit (run) these scripts from file, refer to
sqrlUsage
.
The following (very simple) example scripts won't necessarily work with your own version of SQL or your own data source.
-- My file select 1; select 2; select 3;
Multi-statement SQL files can be copied directly from “SQL Developer” (or similar application). Each of the (above) three queries will be submitted in turn. Only the final result will be returned to the R calling environment (in this case, a data frame containing a single value of 3).
-- This is SQL select <R> # This is R if (exists("x")) { x } else { 0 } </R> /* This is SQL */ from dual
SQRL supports the embedding of R within SQL, via
XML-style tags (<R> ... </R>
), as above.
This enables (explicitly and/or implicitly) parameterised SQL queries.
In this example (above), if the variable x was explicitly passed to the
query, then the supplied value will be used (see sqrlUsage
).
If x was not explicitly passed, then it can be inherited (implicitly
passed) from the calling environment.
In this case, if x was not explicitly supplied and also does not x
exist within the calling environment, then a default value of 0 will be used.
SQL comments are allowed within R (<R> ... </R>
) sections, so
that SQL syntax highlighting can be better applied to the file.
R comments are not allowed within SQL sections
Nested SQL queries can be made from within <R> ... </R>
tags.
However, <R> ... </R>
tags cannot be nested.
-- SQL statement, ending on a semicolon. use mydatabase;/* SQL statement, ending on a <do> tag. */ create table mynewtable as select columnA, columnB from originaltable <do>
-- SQL query, ending on a <result> (assignment) tag. -- In this case, the result of the query (a data frame) -- is assigned to an R variable 'a' (within a temporary -- working environment communal to this script). select max(columnA) from mynewtable <result -> a>
# Manipulate the result in R, then # proceed to the next SQL statement. b <- runif(1) a <- as.numeric(a) + b <do>
-- Use the earlier result in a new query. select columnA, columnB from mynewtable where columnA > <R> a * b </R> and columnB < <R> z <- 5; z + 2 </R> <result -> x>
-- Only the result of the final operation -- is returned at the end of the script. return(list(minA = min(x$columnA), maxB = max(x$columnB))
SQL statements are terminated by any one of a semicolon, a <do>
tag, a <result>
tag, or the end of the file.
After a <result>
tag, statements are interpreted as R (as opposed to
SQL) until either a <do>
tag, or the end of the file, is
encountered.
Following a <do>
tag, statements are interpreted as SQL.
In the case of a <result -> var>
tag, the SQL query result is
assigned to an R variable var within the script's temporary working
environment.
This variable can be arbitrarily manipulated in R, and the result of that
manipulation used as part of a subsequent SQL query.
With the exception of the name of the target (assignment) variable within a
result
tag, tags are not case sensitive.
When the name of the target variable is either “null” or “NULL”,
then the query result is not assigned into the working environment (as with a
<do>
tag), but the subsequent script is still interpreted as R (unlike
with a <do>
tag).
After the end of the script, the working environment is lost, along with any variables within it. Only the final result is returned to the calling environment.
-- R sections can be used to set temporary variables. <R> columns <- list("columnA", "columnB") wordlist <- list("'red'", "'blue'") conditions <- c("and columnA < 2", "and columnD > 4") <do>-- Now the query. select <R> columns </R> from dbname.tablename where columnC in (<R> wordlist </R>) <R> conditions </R>
-- End the script here. -- The statements below are ignored. <stop> and columnA > 0
Whereas <R> ... </R>
sections are embedded within SQL (which
may be blank), <R> ... <do>
sections exist outside of any SQL.
While the result of an <R> ... </R>
section is substituted back into the
surrounding SQL (and will form part of the query), an
<R> ... <do>
section is simply evaluated within the script's working
environment.
Although a <result -> null> ... <do>
section may be preceded by
SQL (which will be submitted prior to evaluating the section), an
<R> ... <do>
section may not be (the two sections being otherwise
equivalent).
Lists are inserted comma collapsed. Vectors are inserted newline collapsed.
A <stop>
tag imitates the end of the file, and can be used to interrupt a
script for debugging.
/* Implements drop-if-exists in a SQL that doesn't. */-- Pull details of temporary tables. help volatile table <result -> a><do>
-- If those details are a data frame, then -- at least one temporary table exists. <if (class(a) == class(data.frame()))> <R> i <- 1 <do>
-- Drop each table in turn. <while (i <= nrow(a))> drop table <R> a[i, "Table Name"] </R> <do> <R> print(paste("dropped", a[i, "Table Name"])) i <- i + 1 <do> </while>
-- Exit from here. <return (paste(i - 1, "tables dropped"))> </if>
-- Otherwise, we had no temps to drop. <return ("no temps found")>
Within SQL sections, <if>
and <while>
tags can be used
to control the (repeated) submission of queries.
The parentheses enclosing the conditional R expression are mandatory, and that
expression must evaluate to a Boolean singleton.
Within SQL sections, <return>
tags can be used to exit early
from a script.
A return value (R expression) is mandatory, and it must be enclosed in
parentheses.
Calling the base return
function inside an R section does not exit the
script; it merely sends a value to the temporary working environment.
A difference between <return>
and <stop>
, is that the latter
ignores conditionals; <stop>
applies even when nested within an
<if>
that evaluated to FALSE
.
help volatile table <result -> a>; <if (i <- 0; class(a) == class(data.frame()))> <while (i <- i + 1; i <= nrow(a))> drop table <R> a[i, "Table Name"] </R>; <R> print(paste("dropped", a[i, "Table Name"]));; </while> <return (paste(i - 1, "tables dropped"))> </if> <R> "no temps found"
This is an alternative implementation of example #5.
Multiple R statements are allowed within conditional tag expressions.
In the case of while loops, all statements within the tag expression are
evaluated on each iteration.
The condition takes the value of the final statement, which must evaluate to
either TRUE
or FALSE
.
Semicolons can be used to terminate R sections (as well as SQL
statements).
Within an R section (following either an <R> or a <result> tag), if there is
nothing but whitespace (or nothing at all) between a semicolon and the start of
its line, the previous semicolon, or the start of the R section (whichever
comes first, heading left from the semicolon), then that semicolon marks the
end of the R section and is equivalent to a <do>
tag (the subsequent
script is read as SQL).
So located semicolons are not accepted by R's parser.
select <if (dow == "Monday")> ColumnA <else if (dow %in% c("Saturday", "Sunday"))> ColumnB, ColumnC <else> * </if> from some.table
The closing tags, </if>
and </while>
, do not imply <do>
.
This allows switching column names, or other SQL
fragments, in and out of
a larger query.
The same end can be achieved through embedded R (<R> ... </R>
tags), or
other mechanisms.
In the above, dow is an externally supplied parameter (see example #2).
The parser is simple, and does not enforce correct nesting structure. Unintuitive output may appear when nesting violations occur.
# NOT RUN {
# Define a new data source.
sqrlSource("mire", "dsn=Mire")
# }
# NOT RUN {
# Submit a SQL script to the source.
mire("my/script.sql")
# Submit a SQRL script, with explicit parameter values.
mire("my/parameterised/script.sqrl",
day = Sys.Date(), colour = "'blue'")
# }
Run the code above in your browser using DataLab