repartition

0th

Percentile

Repartition

The following options for repartition are possible:

  • 1. Return a new SparkDataFrame that has exactly numPartitions.

  • 2. Return a new SparkDataFrame hash partitioned by the given columns into numPartitions.

  • 3. Return a new SparkDataFrame hash partitioned by the given column(s), using spark.sql.shuffle.partitions as number of partitions.

Usage
repartition(x, ...)

# S4 method for SparkDataFrame repartition(x, numPartitions = NULL, col = NULL, ...)

Arguments
x

a SparkDataFrame.

...

additional column(s) to be used in the partitioning.

numPartitions

the number of partitions to use.

col

the column by which the partitioning will be performed.

Note

repartition since 1.4.0

See Also

coalesce, repartitionByRange

Other SparkDataFrame functions: SparkDataFrame-class, agg(), alias(), arrange(), as.data.frame(), attach,SparkDataFrame-method, broadcast(), cache(), checkpoint(), coalesce(), collect(), colnames(), coltypes(), createOrReplaceTempView(), crossJoin(), cube(), dapplyCollect(), dapply(), describe(), dim(), distinct(), dropDuplicates(), dropna(), drop(), dtypes(), exceptAll(), except(), explain(), filter(), first(), gapplyCollect(), gapply(), getNumPartitions(), group_by(), head(), hint(), histogram(), insertInto(), intersectAll(), intersect(), isLocal(), isStreaming(), join(), limit(), localCheckpoint(), merge(), mutate(), ncol(), nrow(), persist(), printSchema(), randomSplit(), rbind(), rename(), repartitionByRange(), rollup(), sample(), saveAsTable(), schema(), selectExpr(), select(), showDF(), show(), storageLevel(), str(), subset(), summary(), take(), toJSON(), unionByName(), union(), unpersist(), withColumn(), withWatermark(), with(), write.df(), write.jdbc(), write.json(), write.orc(), write.parquet(), write.stream(), write.text()

Aliases
  • repartition
  • repartition,SparkDataFrame-method
Examples
# NOT RUN {
sparkR.session()
path <- "path/to/file.json"
df <- read.json(path)
newDF <- repartition(df, 2L)
newDF <- repartition(df, numPartitions = 2L)
newDF <- repartition(df, col = df$"col1", df$"col2")
newDF <- repartition(df, 3L, col = df$"col1", df$"col2")
# }
Documentation reproduced from package SparkR, version 2.4.6, License: Apache License (== 2.0)

Community examples

Looks like there are no examples yet.