repartitionByRange

0th

Percentile

Repartition by range

The following options for repartition by range are possible:

  • 1. Return a new SparkDataFrame range partitioned by the given columns into numPartitions.

  • 2. Return a new SparkDataFrame range partitioned by the given column(s), using spark.sql.shuffle.partitions as number of partitions.

Usage
repartitionByRange(x, ...)

# S4 method for SparkDataFrame repartitionByRange(x, numPartitions = NULL, col = NULL, ...)

Arguments
x

a SparkDataFrame.

...

additional column(s) to be used in the range partitioning.

numPartitions

the number of partitions to use.

col

the column by which the range partitioning will be performed.

Note

repartitionByRange since 2.4.0

See Also

repartition, coalesce

Other SparkDataFrame functions: SparkDataFrame-class, agg(), alias(), arrange(), as.data.frame(), attach,SparkDataFrame-method, broadcast(), cache(), checkpoint(), coalesce(), collect(), colnames(), coltypes(), createOrReplaceTempView(), crossJoin(), cube(), dapplyCollect(), dapply(), describe(), dim(), distinct(), dropDuplicates(), dropna(), drop(), dtypes(), exceptAll(), except(), explain(), filter(), first(), gapplyCollect(), gapply(), getNumPartitions(), group_by(), head(), hint(), histogram(), insertInto(), intersectAll(), intersect(), isLocal(), isStreaming(), join(), limit(), localCheckpoint(), merge(), mutate(), ncol(), nrow(), persist(), printSchema(), randomSplit(), rbind(), rename(), repartition(), rollup(), sample(), saveAsTable(), schema(), selectExpr(), select(), showDF(), show(), storageLevel(), str(), subset(), summary(), take(), toJSON(), unionByName(), union(), unpersist(), withColumn(), withWatermark(), with(), write.df(), write.jdbc(), write.json(), write.orc(), write.parquet(), write.stream(), write.text()

Aliases
  • repartitionByRange
  • repartitionByRange,SparkDataFrame-method
Examples
# NOT RUN {
sparkR.session()
path <- "path/to/file.json"
df <- read.json(path)
newDF <- repartitionByRange(df, col = df$col1, df$col2)
newDF <- repartitionByRange(df, 3L, col = df$col1, df$col2)
# }
Documentation reproduced from package SparkR, version 2.4.6, License: Apache License (== 2.0)

Community examples

Looks like there are no examples yet.