eachElem
executes function fun
multiple times in
parallel with a varying set of arguments, and returns the results in a
list. It is functionally similar to the standard R
lapply
function, but is more flexible in the way that
the function arguments can be specified.
"eachElem"(.Object, fun, elementArgs=list(), fixedArgs=list(), eo=NULL, DEBUG=FALSE)
+
, %*%
, etc.,
the function name must be quoted.fun
. Each element should correspond to an argument of fun
.fun
. Each element should correspond to an argument of fun
.browser
function be called upon entry to
eachElem
? The default is FALSE
.eo
argument is a list that can be used to specify various
options. The following options are recognized:
eo$elementFunc
option can be used to specify a callback function that provides the varying arguments for fun
in place
of elementArgs
(that is, you can't specify both
eo$elementFunc
and elementArgs
). eachElem
calls
the eo$elementFunc
function to get a list of arguments for one
invocation of fun
, and will keep calling it until
eo$elementFunc
signals that there are no more tasks to execute
by calling the stop
function with no arguments.
eachElem
appends any values specified by fixedArgs
to
the list returned by eo$elementFunc
just as if
elementArgs
had been specified. eachElem
passes the number of the desired task (starting from
1) as the first argument to eo$elementFunc
, and the value of
the eo$by
option as the second argument. Note that the use of
the eo$elementFunc
function is an advanced feature, but is very
useful when executing a large number of tasks, or when the arguments
are coming from a database query, for example. For that reason, the
eo$loadFactor
option should usually be used in conjunction with
eo$elementFunc
(see description below).eo$accumulator
option can be used to specify a callback
function that will receive the results of the task execution as soon
as they are complete, rather than returning all of the task results as
a list when eachElem
completes. In other words,
eachElem
will call the eo$accumulator
function with task
results as soon as it receives them from the sleigh workers, rather
than saving them in memory until all the tasks are complete. Note
that if the tasks are chunked (using the eo$chunkSize
option
described below), then the eo$accumulator
function will receive
multiple task results, which is why the task results are always passed
to the eo$accumulator
function in a list. The first argument to the eo$accumulator
function is a list of
results, where the length of the list is equal to eo$chunkSize
.
The second argument is a vector of task numbers, starting from 1,
where the length of the vector is also equal to eo$chunkSize
.
The task numbers are very important, because the results are not
guaranteed to be returned in order. eo$accumulator
is another
advanced feature, and like eo$elementFunc
, is very useful when
executing a large number of tasks. It allows you to process each
result as they finish, rather than forcing you to wait until all of
the tasks are complete. In conjunction with eo$elementFunc
and
eo$loadFactor
, you can set up a pipeline, allowing you to
process an unlimited number of tasks efficiently. Note that when
eo$accumulator
is specified, eachElem
returns NULL, not
the list of results, since eachElem
doesn't save any of the
results after passing them to the eo$accumulator
function.eo$by
option specifies the iteration scheme to use for matrix
and data frame elements in elementArgs
. The default value is
"row"
, but it can also be set to "column"
or "cell"
. Vectors and
lists in elementArgs
are not affected by this option.eo$chunkSize
option is a tuning parameter that
specifies the number of tasks that sleigh workers should allocate at a
time. The default value is 1, but if the tasks are small, performance
can be improved by specifying a larger value, which decreases the
overhead per task. If the fun
function executes very quickly, you may not be able
to keep your workers busy, giving you poor performance. In that case,
consider setting the eo$chunkSize
option to a large enough
number to increase the effective task execution time.eo$loadFactor
option is a tuning parameter that specifies
the maximum number of tasks per worker that are submitted to the
sleigh at the same time. If set, no more than (loadFactor *
workerCount)
tasks will be submitted at the same time. This helps to
control the resource demands that are made on the NetWorkSpaces
server, which is especially important if there are a large number of
tasks. Note that this option is ignored if blocking
is set to
TRUE
, since the two options are incompatible with each other. If in doubt, set the eo$loadFactor
option to 10. That will
almost certainly avoid putting a strain on the NetWorkSpaces server, and
if that isn't enough to keep your workers busy, then you should
really be using the eo$chunkSize
option to give the workers
more to do.eo$blocking
option is used to indicate whether to wait for the
results, or to return as soon as the tasks have been submitted. If
set to FALSE
, eachElem
will return a sleighPending
object that is used to monitor the status of the tasks, and to
eventually retrieve the results. You must wait for the results to be
complete before executing any further tasks on the sleigh, or an
exception will be raised. The default value is TRUE
.eo$argPermute
option is used to reorder the arguments passed
to fun
. It is generally only useful if the fixedArgs
argument has been specified, and some of those arguments need to
precede the arguments specified via elementArgs
. Note that by
using recycling of elements in elementArgs
, the use of
fixedArgs
and argPermute
can often be avoided entirely.eachElem
function forms argument sets from objects passed in via
elementArgs
and fixedArgs
.
The elements of elementsArgs
are used to specify the arguments that are
changing, or varying, from task to task, while the elements of
fixedArgs
are used to specify the arguments that do not vary
from task to task. The number of tasks that are executed by a call to
eachElem
is basically equal to the length of the longest vector
(or list, etc) in elementArgs
. If any elements of
elementArgs
are shorter, then their values are recycled, using
the standard R rules. The elements of elementArgs
may be vectors, lists, matrices, or
data frames. The vectors and lists are always iterated over by
element, or "cell"
, but matrices and data frames can also be iterated
over by row or column. This is controlled by the by
option,
specified via the eo
argument. See below for more information.
For example:
eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))
This will submit four tasks, since the length of 1:4 is four. The four tasks will be to add the arguments 1 and 100, 2 and 100, 3 and 100, and 4 and 100. The result is a list containing the four values 101, 102, 103, and 104.
Another way to do the same thing is with:
eachElem(s, '+', elementArgs=list(1:4, 100))
Since the second element of elementArgs
is length one, it's
value is recycled four times, thus specifying the same set of tasks as
in the previous example. This method also has the advantage of making it
easy to put fixed values before varying values, without the need for
the eo$argPermute
option, discussed later. For example:
eachElem(s, '-', elementArgs=list(100, 1:4))
is similar to the R statement:
100 - 1:4
Note that in simple examples like these, where the results are numeric
values, the standard R unlist
function can be very
useful for converting the resulting list into a vector.
eachWorker
, sleighPending
## Not run:
# # create a sleigh
# s <- sleigh()
#
# # compute the list mean for each list element
# x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
# eachElem(s, mean, list(x))
#
# # median and quartiles for each list element
# eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))
#
# # use eo$elementFunc to supply 100 random values and eo$accumulator to
# # receive the results
# elementFunc <- function(i, by) {
# if (i <= 100) list(i=i, x=runif(1)) else stop()
# }
# accumulator <- function(resultList, taskVector) {
# if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
# cat(paste(resultList[[1]], collapse=' '), '\n')
# }
# eo <- list(elementFunc=elementFunc, accumulator=accumulator)
# eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
# ## End(Not run)
Run the code above in your browser using DataLab