Last chance! 50% off unlimited learning
Sale ends in
Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou.
# S4 method for SparkDataFrame,character
freqItems(x, cols, support = 0.01)
A SparkDataFrame.
A vector column names to search frequent items in.
(Optional) The minimum frequency for an item to be considered frequent
.
Should be greater than 1e-4. Default support = 0.01.
a local R data.frame with the frequent items in each column
Other stat functions: approxQuantile
,
corr
, cov
,
crosstab
, sampleBy
# NOT RUN {
df <- read.json("/path/to/file.json")
fi = freqItems(df, c("title", "gender"))
# }
Run the code above in your browser using DataLab