Returns a stratified sample without replacement based on the fraction given on each stratum.

sampleBy(x, col, fractions, seed)# S4 method for SparkDataFrame,character,list,numeric sampleBy(x, col, fractions, seed)

A SparkDataFrame

column that defines strata

A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.

random seed

A new SparkDataFrame that represents the stratified sample

Other stat functions: approxQuantile(), corr(), cov(), crosstab(), freqItems()

approxQuantile()

corr()

cov()

crosstab()

freqItems()

# NOT RUN { df <- read.json("/path/to/file.json") sample <- sampleBy(df, "key", fractions, 36) # }

