SparkR (version 2.1.2)

sampleBy: Returns a stratified sample without replacement

Description

Returns a stratified sample without replacement based on the fraction given on each stratum.

Usage

sampleBy(x, col, fractions, seed)

# S4 method for SparkDataFrame,character,list,numeric sampleBy(x, col, fractions, seed)

Arguments

x

A SparkDataFrame

col

column that defines strata

fractions

A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.

seed

random seed

Value

A new SparkDataFrame that represents the stratified sample

See Also

Other stat functions: approxQuantile, corr, cov, crosstab, freqItems

Examples

Run this code
# NOT RUN {
df <- read.json("/path/to/file.json")
sample <- sampleBy(df, "key", fractions, 36)
# }

Run the code above in your browser using DataLab