compileFunction: Compile an R function to native machine code via LLVM

Description

This is the main function of this package and takes an R function and attempts to compile the R code into a native routine. It does this by translating the R code to LLVM instructions and then compiling that LLVM description to machine code. This has the potential to greatly speed up the execution of simple R code that is not vectorized but works on elements of vectors separately.

To compile the function, we need information about the types of the parameters and the return value of the function. These can be specified via returnType and types. Alternatively, if the function has an attribute llvmTypes, we get the type information from that. This allows authors of the R function to provide information that can be used directly to compile the function. We plan to also support the TypeInfo package's annotation of functions with type information.

This function attempts to compile other R functions that this function calls. To do this, we need the type information. The signatures can be specified in the call via .functionInfo as a named list with an element for each function. The form of the element is a list with an element named returnType and another named params.

Some of the functions this R function calls are already available in C libraries and we might want to use those directly, e.g. sqrt. To do this, we need to specify the name of the routine and its signature. We do this via the .routineInfo argument. This is a named list with the names identifying the routines. Each element is of the form returnType and params.

Usage

compileFunction(fun, returnType, types = list(), module = Module(name), name = NULL,
         NAs = FALSE,
         asFunction = FALSE, asList = FALSE,
         optimize = TRUE, ...,
         .functionInfo = list(...),
         .routineInfo = list(),
         .compilerHandlers = getCompilerHandlers(),
         .globals = getGlobals(fun, names(.CallableRFunctions), .ignoreDefaultArgs, 
                                .assert = .assert, .debug = .debug),
         .insertReturn = !identical(returnType, VoidType),
         .builtInRoutines = getBuiltInRoutines(),
         .constants = getConstants(),
         .vectorize = character(), .execEngine = NULL,
         structInfo = list(), .ignoreDefaultArgs = TRUE, .useFloat = FALSE,
         .zeroBased = logical(), .localVarTypes = list(),
          .fixIfAssign = TRUE, .CallableRFunctions = list(), 
         .RGlobalVariables = character(),
         .debug = TRUE, .assert = TRUE, .addSymbolMetaData = TRUE,
         .readOnly = constInputs(fun), .integerLiterals = TRUE)

Arguments

fun

the R function to be compiled.

returnType

the LLVM type of the return value. This can be omitted if the function has an llvmType attribute giving the details of the signature.

types

a list giving the LLVM types for each of the parameters in the R function. As with returnType, this need not be specified if the function has an llvmType attribute. Also, this can be omitted if the function has no parameters.

module

the LLVM Module object.

name

the name to use for the function, typically obtained from deparsing fun, but which can be specified explicitly if not referencing the function directly in the call.

NAs

a logical controlling whether to add code to handle NAs in the computations. If we know that there are no NAs in the inputs, we can avoid adding extra code to handle them. Currently ignored.

asFunction

a logical value

…

name = value pairs of type information describing other R functions that fun may call. This allows those to be compiled at the same time. These elements are collected into .functionInfo

.functionInfo

a list of named elements that provide the return type and parameter types of R functions. See ….

.routineInfo

a named list of signatures describing native routines that are to be considered callable from our compiled function and which that code may call.

optimize

a logical value that controls whether the module is optimized before being returned. This has the potential to make the functions faster.

.compilerHandlers

a named list of functions. This is used to find a handler for generating code for different language constructs and for calls to particular R functions. Expressions are compiled using methods of the compile function. But not all expressions have a class that identifies their purpose, e.g. return which appears as a regular call. So in these cases, we use this named list to find an element for the particular call. This allows the caller to customize how we generate code for a call to a particular R function. The basic idea is that one makes a copy of the default handlers and replaces or adds entries to that and passes the modified list to compileFunction.

.globals

a character vector giving the names of functions which are to be compiled also in this module. These are typically the names of functions that are called within the body of this function.

.execEngine

the execution object which, if present and we are creating an R function to hide the compiled routine, is added as a default value of a parameter so that the caller doesn't have to specify it.

structInfo

a list with an element for each struct type we might reference in the code. Each element should be a named list giving the field names in the struct and their types.

.builtInRoutines

a list of available routines we know about along with their return type and parameter types, i.e. signature

.insertReturn

whether we need to add explicit calls to return in the R function before we compile it. If we know that the code already contains the return calls in the appropriate places, we can save time by not doing this, but it is very small and never hurts to do this.

.constants

the names of variables that are to be considered constants

.vectorize

a logical value controlling whether to make the code vectorized in its first argument, or if this is a parameter name then that parameter.

asList

a logical value that controls what is returned. If this is TRUE, we return a list with the module, the compiled routine and the compiler object containing all of its state

.ignoreDefaultArgs

a logical value which controls whether we look at the code for the default values of the parameters in the R function. Currently we use this to avoid working with global variables in R that we will not actually reference.

.useFloat

a logical value. This controls whether we use double or float data type for numeric variables.

.zeroBased

logical value. This controls whether we use 0-based counting. This is needed for the compilation of code for the GPU (?)

.localVarTypes

a named list giving the type objects for any of the local variables. These are "hints". The types should be LLVM types, i.e. objects in the Rllvm package representing explicit types.

.fixIfAssign

a logical value that controls if we process the R function code to rewrite assignments of the form x = if(cond) a else b, to if(cond) x = a else x = b via the fixIfAssign function.

.CallableRFunctions

a list with named elements. Each element identifies an R function that can be called from the compiled code. Each element is a list specifying the return type of the function and for the parameters of that function. These are specified as LLVM types. The type information can also be specified within the R code via .R(foo(a, b), list(returnType, arg1Type, arg2Type)).

.RGlobalVariables

a character vector specifying the names of variables that are in the R global environment. The types of these variables is computed at compile time if necessary.

.debug

either a logical value or a character vector. If TRUE, calls in the R code of the form .debug(...) are compiled. If FALSE, they are omitted. If a character vector is provided, this allows the caller to specify other names for functions which are considered debugging calls.

.assert

similar to .debug but for assertions. By default, these are of the form .assert(condition)

.addSymbolMetaData

a logical value that controls whether meta data about the external symbols invoked in the compiled code are added to the Module. This allows another session/application to resolve those symbols correctly when the module is deserialized.

.readOnly

a character vector giving the names of any parameters in the compiled code that are pointer types but read-only and not written to. The constInputs function in the CodeAnalysis package can determine this in common cases. However, one can avoid the cost of that analysis or provide the information that it cannot determine directly.

.integerLiterals

a logical value. If this is TRUE, assignments of the form x = 2 will treat the numeric value as an integer. This is different from R which treats 2 as a numeric object evethough its value - not its type - is an integer.

Value

If asFunction, this returns an R function that has the same signature as fun and which can be called identically but which will use the newly compiled function.

References

The Rllvm package on www.omegahat.org and LLVM itself at llvm.org.

Examples

Run this code


    # An example of being able to compile both of these in the
    # same module and calling foo from bar.
foo =
function(x, y)
{
  return( x + y )
}

bar =
function(x, y)
{
  return ( foo(x, y) + 10 )  
}

foobar =
function(x, y)
{
  return ( sqrt(foo(x, y)) )
}

 foo.c = compileFunction(foo, DoubleType, list(DoubleType, DoubleType),
                      .routineInfo = list( sqrt = list(DoubleType, DoubleType) ))


 fb = compileFunction(foobar, DoubleType, list(DoubleType, DoubleType),
                       module = as(foo.c, "Module"),
                      .routineInfo = list( sqrt = list(DoubleType, DoubleType) ),
                       foo = list(returnType = DoubleType,
                                  params = c(DoubleType, DoubleType)))

 run(fb, 4, 5) # gives 3 = sqrt(4 + 5)
 .llvm(fb, 4, 5)


  # Here we return an R function that is directly callable.

 # Create a new module and we will need foo in that. So recreate.
 foo.c = compileFunction(foo, DoubleType, list(DoubleType, DoubleType),
                      .routineInfo = list( sqrt = list(DoubleType, DoubleType) ))
 fb = compileFunction(foobar, DoubleType, list(DoubleType, DoubleType),
                       module = as(foo.c, "Module"),
                      .routineInfo = list( sqrt = list(DoubleType, DoubleType) ),
                       foo = list(returnType = DoubleType,
                                  params = c(DoubleType, DoubleType)),
                      asFunction = TRUE)

 fb(4, 5)



 #  Here we show how to override how an expression is called
 # by providing our own .compilerHandlers. Here we change
 # any call to Sys.Date() to a constant.
myDate = function() {
        return(Sys.Date())
     }

myOPS = getCompilerHandlers()
myOPS[["Sys.Date"]] =
   function(call, env, ir, ...) {
       cat("In Sys.Date handler\n")
       ir$createConstant(15015L)
   }


  # Note we specify .functionInfo so that we don't
  # try to create a Function object in the module for Sys.Date which
  # we won't end up calling.
f = compileFunction(myDate, Int32Type, optimize = FALSE,
                     .compilerHandlers = myOPS, .globals = NULL)

.llvm(f)


myPlus = function() {
          return(1L + x)
     }