multidplyr (version 0.0.0.9000)

partition: Partition data across workers in a cluster

Description

Partitioning ensures that all observations in a group end up on the same worker. To try and keep the observations on each worker balanced, `partition()` uses a greedy algorithm that iteratively assign each group to the worker that currently has the fewest rows.

Usage

partition(data, cluster)

Arguments

data

Dataset to partition, typically grouped. When grouped, all observations in a group will be assigned to the same cluster.

cluster

Cluster to use.

Value

A [party_df].

Examples

Run this code
# NOT RUN {
library(dplyr)
cl <- default_cluster()
cluster_library(cl, "dplyr")

mtcars2 <- partition(mtcars, cl)
mtcars2 %>% mutate(cyl2 = 2 * cyl)
mtcars2 %>% filter(vs == 1)
mtcars2 %>% group_by(cyl) %>% summarise(n())
mtcars2 %>% select(-cyl)
# }

Run the code above in your browser using DataLab