SparkR v2.1.2


Monthly downloads



R Frontend for Apache Spark

Provides an R Frontend for Apache Spark.

Functions in SparkR

Name Description
AFTSurvivalRegressionModel-class S4 class that represents a AFTSurvivalRegressionModel
ALSModel-class S4 class that represents an ALSModel
GBTClassificationModel-class S4 class that represents a GBTClassificationModel
GBTRegressionModel-class S4 class that represents a GBTRegressionModel
KMeansModel-class S4 class that represents a KMeansModel
KSTest-class S4 class that represents an KSTest
GroupedData-class S4 class that represents a GroupedData
IsotonicRegressionModel-class S4 class that represents an IsotonicRegressionModel
GaussianMixtureModel-class S4 class that represents a GaussianMixtureModel
GeneralizedLinearRegressionModel-class S4 class that represents a generalized linear model
MultilayerPerceptronClassificationModel-class S4 class that represents a MultilayerPerceptronClassificationModel
NaiveBayesModel-class S4 class that represents a NaiveBayesModel
LDAModel-class S4 class that represents an LDAModel
LogisticRegressionModel-class S4 class that represents an LogisticRegressionModel
asin asin
abs abs
acos acos Download data from a SparkDataFrame into a R data.frame
atan atan
between between
bin bin
cancelJobGroup Cancel active jobs for the specified group
arrange Arrange Rows by Variables
array_contains array_contains
avg avg
ascii ascii
bitwiseNOT bitwiseNOT
bround bround
coltypes coltypes
RandomForestClassificationModel-class S4 class that represents a RandomForestClassificationModel
RandomForestRegressionModel-class S4 class that represents a RandomForestRegressionModel
add_months add_months
alias alias
base64 base64
coalesce Coalesce
collect Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
cast Casts the column to a different data type.
conv conv
corr corr
crossJoin CrossJoin
column S4 class that represents a SparkDataFrame column
asc A set of operations working with SparkDataFrame columns
colnames Column Names of SparkDataFrame
cume_dist cume_dist
crosstab Computes a pair-wise frequency table of the given columns
dayofyear dayofyear
decode decode
dropDuplicates dropDuplicates
dropTempTable (Deprecated) Drop Temporary Table
factorial factorial
filter Filter
generateAliasesForIntersectedCols Creates a list of columns by replacing the intersected ones with aliases
getNumPartitions getNumPartitions
dapply dapply
dapplyCollect dapplyCollect
date_add date_add
distinct Distinct
drop drop
cache Cache
cacheTable Cache Table
clearCache Clear Cache
clearJobGroup Clear current job group ID and its description
SparkDataFrame-class S4 class that represents a SparkDataFrame
WindowSpec-class S4 class that represents a WindowSpec
approxCountDistinct Returns the approximate number of distinct items in a group
approxQuantile Calculates the approximate quantiles of a numerical column of a SparkDataFrame
atan2 atan2
attach Attach SparkDataFrame to R search path
cbrt cbrt
ceil Computes the ceiling of the given value
count Count
hour hour
hypot hypot
insertInto insertInto
install.spark Download and Install Apache Spark to a Local Directory
floor floor
format_number format_number
glm Generalized Linear Models (R-compliant)
greatest greatest
group_by GroupBy
cos cos
cosh cosh
createExternalTable Create an external table
createOrReplaceTempView Creates a temporary view using the given name.
countDistinct Count Distinct Values
cov cov
covar_pop covar_pop
dense_rank dense_rank
hash hash
instr instr
least least
length length
mean mean
merge Merges two data frames
dropna A set of SparkDataFrame functions working with NA values
nanvl nanvl
concat concat
concat_ws concat_ws
crc32 crc32
intersect Intersect
otherwise otherwise
over over
print.structField Print a Spark StructField.
print.structType Print a Spark StructType.
randomSplit randomSplit
datediff datediff
dayofmonth dayofmonth
except except
exp exp
expm1 expm1
createDataFrame Create a SparkDataFrame
date_format date_format
date_sub date_sub
dropTempView Drops the temporary view with the given view name in the catalog.
dtypes DataTypes
rangeBetween rangeBetween
rename rename
repartition Repartition
second second
select Select
selectExpr SelectExpr
setJobGroup Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
dim Returns the dimensions of SparkDataFrame
encode encode
endsWith endsWith
first Return the first row of a SparkDataFrame
fitted Get fitted result from a k-means model
soundex soundex
spark.addFile Add a file or directory to be downloaded with this Spark job on every node.
spark.gbt Gradient Boosted Tree Model for Regression and Classification
spark.getSparkFiles Get the absolute path of a file added through spark.addFile.
spark.mlp Multilayer Perceptron Classification Model
expr expr
gapply gapply
gapplyCollect gapplyCollect
hashCode Compute the hashCode of an object
head Head
last_day last_day
lead lead
ltrim ltrim
%in% Match a column with given values.
format_string format_string
freqItems Finding frequent items for columns, possibly with false positives
ifelse ifelse
initcap initcap
isnan is.nan
isLocal isLocal
log log
spark.naiveBayes Naive Bayes Models
sparkRSQL.init (Deprecated) Initialize a new SQLContext
spark_partition_id Return the partition ID as a column
stddev_samp stddev_samp
storageLevel StorageLevel
log10 log10
log1p log1p
log2 log2
months_between months_between
min min
minute minute
persist Persist
explain Explain
explode explode
from_unixtime from_unixtime
from_utc_timestamp from_utc_timestamp
hex hex
histogram Compute histogram statistics for given column
join Join
kurtosis kurtosis
levenshtein levenshtein
mutate Mutate
next_day next_day
nrow Returns the number of rows in a SparkDataFrame
limit Limit
lower lower
lpad lpad
monotonically_increasing_id monotonically_increasing_id
pivot Pivot a column of the GroupedData and perform the specified aggregation.
rand rand
randn randn
printSchema Print Schema of a SparkDataFrame
quarter quarter
rank rank
rbind Union two or more SparkDataFrames
reverse reverse
read.orc Create a SparkDataFrame from an ORC file.
read.parquet Create a SparkDataFrame from a Parquet file.
round round
row_number row_number
rowsBetween rowsBetween
rint rint
sample Sample
show show
showDF showDF
skewness skewness
sort_array sort_array
spark.isoreg Isotonic Regression Model
spark.kmeans K-Means Clustering Model
spark.randomForest Random Forest Model for Regression and Classification
rtrim rtrim
rpad rpad
sha2 sha2
shiftLeft shiftLeft
sinh sinh
lag lag
last last
lit lit
locate locate
spark.survreg Accelerated Failure Time (AFT) Survival Regression Model
sparkR.session.stop Stop the Spark Session and Spark Context
sparkR.uiWebUrl Get the URL of the SparkUI instance for the current active SparkSession
str Compactly display the structure of a dataset
struct struct
size size
spark.lda Latent Dirichlet Allocation
spark.logit Logistic Regression Model
sparkR.callJMethod Call Java Methods
sparkR.callJStatic Call Static Java Methods
startsWith startsWith
stddev_pop stddev_pop
substring_index substring_index
sum sum
tableToDF Create a SparkDataFrame from a SparkSQL Table
tables Tables
toRadians toRadians
describe summary
tableNames Table Names
to_utc_timestamp to_utc_timestamp
translate translate
month month
ntile ntile
orderBy Ordering Columns in a WindowSpec
pmod pmod
posexplode posexplode
read.df Load a SparkDataFrame
take Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
tan tan
trim trim
unbase64 unbase64
var var
read.jdbc Create a SparkDataFrame representing the database table accessible via JDBC URL
regexp_replace regexp_replace
registerTempTable (Deprecated) Register Temporary Table
sampleBy Returns a stratified sample without replacement
max max
md5 md5
ncol Returns the number of columns in a SparkDataFrame
negate negate
partitionBy partitionBy
saveAsTable Save the contents of the SparkDataFrame to a data source as a table
shiftRight shiftRight
shiftRightUnsigned shiftRightUnsigned
var_pop var_pop
write.orc Save the contents of SparkDataFrame as an ORC file, preserving the schema.
write.parquet Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
percent_rank percent_rank
predict Makes predictions from a MLlib model
print.jobj Print a JVM object reference.
spark.als Alternating Least Squares (ALS) for Collaborative Filtering
spark.gaussianMixture Multivariate Gaussian Mixture Model (GMM)
spark.getSparkFilesRootDirectory Get the root directory that contains files added through spark.addFile.
spark.glm Generalized Linear Models
sparkR.conf Get Runtime Config from the current active SparkSession
sparkR.init (Deprecated) Initialize a new Spark Context
sparkR.version Get version of Spark on which this application is running
sparkRHive.init (Deprecated) Initialize a new HiveContext
subset Subset
substr substr
sumDistinct sumDistinct
var_samp var_samp
weekofyear weekofyear
write.df Save the contents of SparkDataFrame to a data source.
write.jdbc Save the content of SparkDataFrame to an external database table via JDBC.
agg summarize
uncacheTable Uncache Table
unhex unhex
windowOrderBy windowOrderBy
read.json Create a SparkDataFrame from a JSON file. Load a fitted MLlib model from the input path.
read.text Create a SparkDataFrame from a text file.
regexp_extract regexp_extract
schema Get schema object
windowPartitionBy windowPartitionBy
write.text Save the content of SparkDataFrame in a text file at the specified path.
year year
sd sd
setLogLevel Set new log level
sha1 sha1
signum signum
spark.kstest (One-Sample) Kolmogorov-Smirnov Test
spark.lapply Run a function over a list of elements, distributing the computations with Spark
sparkR.newJObject Create Java Objects
sparkR.session Get the existing SparkSession or initialize a new SparkSession.
sql SQL Query
sin sin
sqrt sqrt
structField structField
structType structType
tanh tanh
toDegrees toDegrees
to_date to_date
when when
window window
with Evaluate a R expression in an environment constructed from a SparkDataFrame
withColumn WithColumn
union Return a new SparkDataFrame containing the union of rows
unix_timestamp unix_timestamp
unpersist Unpersist
upper upper
write.json Save the contents of SparkDataFrame as a JSON file Saves the MLlib model to the input path
No Results!

Vignettes of SparkR

No Results!

Last month downloads


Type Package
License Apache License (== 2.0)
Collate 'schema.R' 'generics.R' 'jobj.R' 'column.R' 'group.R' 'RDD.R' 'pairRDD.R' 'DataFrame.R' 'SQLContext.R' 'WindowSpec.R' 'backend.R' 'broadcast.R' 'client.R' 'context.R' 'deserialize.R' 'functions.R' 'install.R' 'jvm.R' 'mllib.R' 'serialize.R' 'sparkR.R' 'stats.R' 'types.R' 'utils.R' 'window.R'
RoxygenNote 6.0.1
VignetteBuilder knitr
NeedsCompilation no
Packaged 2017-10-03 00:42:30 UTC; holden
Repository CRAN
Date/Publication 2017-10-12 14:13:03 UTC

Include our badge in your README