oclRun: Run a kernel using OpenCL.

Description

oclRun is used to execute code that has been compiled for OpenCL.

oclResult collects results from an asynchronous oclRun call.

Usage

oclRun(kernel, size, ..., native.result = FALSE, wait = TRUE)
oclResult(context, wait = TRUE)

Arguments

kernel

kernel object as obtained from oclSimpleKernel

size

length of the output vector

...

additional arguments passed to the kernel

native.result

logical scalar, if TRUE then the result from a single-precision kernel is not converted to double-precision but returned as a clFloat object.

wait

logical scalar, if TRUE then oclRun waits for the operation to finish and returs the result. Otherwise the kernel is only enqueued, so it will be run in parallel to R and have to be collected later with oclRe

context

context object that was returned by oclRun(..., wait = FALSE) call.

Value

oclRun: for wait = TRUE is the result of the operation, a numeric vector of the length size. Otherwise oclRun returns a call context object that can be used by oclResult to retrieve the result.
oclResult: Result of the previously started operation or NULL if wait=FALSE and the operation has not completed yet.

Details

oclRun pushes kernel arguments, executes the kernel and retrieves the result. The kernel is expected to have either __global double * or __global float * type (write-only) as the first argument which will be used for the result and const int second argument denoting the result length. All other arguments are assumed to be read-only and will be filled accoding to the ... values. Scalar values (vectors of length one) are passed as constants, vectors are passed as global objects. Only numeric (int*, double*), clFloat (float*) and logical (int*) vectors are supported as kernel arguments. Numeric (double-precision) vectors are converted to single-precision automatically when using single-precision kernel. The caller is responsible for matching the argument types according to the kernel in a way similar to .C and .Call.

oclResult retrieves the result of a previous operation that was enqueued using oclRun(..., wait = FALSE). If oclResult(..., wait = FALSE) is used then NULL is returned in case the result is not ready yet. Note that results can be collected only once and the context object becomes invalid after a successful call to oclResult since all associated OpenCL objects are released.

Examples

Run this code

library(OpenCL)
p = oclPlatforms()
d = oclDevices(p[[1]])

code = c(
"__kernel void dnorm(
",
"__global float* output,
",
"const unsigned int count,
",
"__global float* input,
",
"const float mu, const float sigma)
",
"{
",
"int i = get_global_id(0);
",
"if(i < count)
",
"output[i] = exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma)) ",
"/ (sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) );
",
"};")
k.dnorm <- oclSimpleKernel(d[[1]], "dnorm", code, "single")
f <- function(x, mu=0, sigma=1, ...)
  oclRun(k.dnorm, length(x), x, mu, sigma, ...)

## expect differences since the above uses single-precision but
## it should be close enough
f(1:10/2) - dnorm(1:10/2)

## this is optional - use floats instead of regular numeric vectors
x <- clFloat(1:10/2)
f(x, native.result=TRUE)

## does the device support double-precision?
if (any(grepl("cl_khr_fp64", oclInfo(d[[1]])$exts))) {
code = c(
"#pragma OPENCL EXTENSION cl_khr_fp64 : enable
",
"__kernel void dnorm(
",
"__global double* output,
",
"const unsigned int count,
",
"__global double* input,
",
"const double mu, const double sigma)
",
"{
",
"int i = get_global_id(0);
",
"if(i < count)
",
"output[i] = exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma)) ",
"/ (sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) );
",
"};")
k.dnorm <- oclSimpleKernel(d[[1]], "dnorm", code, "double")
f <- function(x, mu=0, sigma=1)
  oclRun(k.dnorm, length(x), x, mu, sigma)

## probably not identical, but close...
f(1:10/2) - dnorm(1:10/2)
} else cat("Sorry, your device doesn't support double-precision
")