Last chance! 50% off unlimited learning
Sale ends in
batchByIndex
for the non-parallel version.When doing a select were the condition is a large number of ids it is not always possible to include them in a single SQL statement. This function will break the list of ids into chunks and allow the indexProcessor to deal with just a small number of ids.
parBatchByIndex(allIndices, indexProcessor, reduce, cl, batchSize = 1e+05)
indexProcessor
function.
reduce
function after all jobs have finished.
indexProcessor
function runs. The order of batchs is maintained. The return value of the reduce
function is then returned.The idea is that this function merges all the results together into one result.
reduce
function is returned.
batchByIndex
## Not run:
#
# cl = makeCluster(2) # create a SNOW cluster
#
# #function to run a query for each batch of indexes
# job = function(indexBatch)
# dbGetQuery(dbConnection, paste("SELECT weight FROM table WHERE id IN (",paste(indexBatch,collapse=","),")"))
#
# # function to combine all the results, in this case by summing them up
# reduce = function(results) sum(unlist(results))
#
# indices = 1:10000
#
# #run queries in parallel and then sum the results
# totalWeight = parBatchByIndex(indices,job,reduce,cl, 1000)
#
# ## End(Not run)
Run the code above in your browser using DataLab