tm (version 0.7-2)

hpc: Parallelized ‘lapply’

Description

Parallelize applying a function over a list or vector according to the registered parallelization engine.

Usage

tm_parLapply(X, FUN, ...)
tm_parLapply_engine(new)

Arguments

X

A vector (atomic or list), or other objects suitable for the engine in use.

FUN

the function to be applied to each element of X.

...

optional arguments to FUN.

new

an object inheriting from class cluster as created by makeCluster() from package parallel, or a function with formals X, FUN and ..., or NULL corresponding to the default of using no parallelization engine.

Value

A list the length of X, with the result of applying FUN together with the ... arguments to each element of X.

Details

Parallelization can be employed to speed up some of the embarrassingly parallel computations performed in package tm, specifically tm_index(), tm_map() on a non-lazy-mapped VCorpus, and TermDocumentMatrix() on a VCorpus or PCorpus. Functions tm_parLapply() and tm_parLapply_engine() can be used to customize parallelization according to the available resources.

tm_parLapply_engine() is used for getting (with no arguments) or setting (with argument new) the parallelization engine employed (see below for examples).

If an engine is set to an object inheriting from class cluster, tm_parLapply() calls parLapply() with this cluster and the given arguments. If set to a function, tm_parLapply() calls the function with the given arguments. Otherwise, it simply calls lapply().

Hence, to achieve parallelization via parLapply() and a default cluster registered via setDefaultCluster(), one can use

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapply(NULL, X, FUN, ...))

or re-register the cluster, say cl, using

  tm_parLapply_engine(cl)

(note that there is no mechanism for programmatically getting the registered default cluster). Using

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapplyLB(NULL, X, FUN, ...))

or

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapplyLB(cl, X, FUN, ...))

gives load-balancing parallelization with the registered default or given cluster, respectively. To achieve parallelization via forking (on Unix-alike platforms), one can use the above with clusters created by makeForkCluster(), or use

  tm_parLapply_engine(parallel::mclapply)

or

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::mclapply(X, FUN, ..., mc.cores = n))

to use mclapply() with the default or given number n of cores.

See Also

makeCluster(), parLapply(), parLapplyLB(), and mclapply().