SparkR v2.4.6


Monthly downloads



R Front End for 'Apache Spark'

Provides an R Front end for 'Apache Spark' <>.

Functions in SparkR

Name Description
ALSModel-class S4 class that represents an ALSModel
DecisionTreeClassificationModel-class S4 class that represents a DecisionTreeClassificationModel
AFTSurvivalRegressionModel-class S4 class that represents a AFTSurvivalRegressionModel
GaussianMixtureModel-class S4 class that represents a GaussianMixtureModel
BisectingKMeansModel-class S4 class that represents a BisectingKMeansModel
GBTClassificationModel-class S4 class that represents a GBTClassificationModel
DecisionTreeRegressionModel-class S4 class that represents a DecisionTreeRegressionModel
alias alias
IsotonicRegressionModel-class S4 class that represents an IsotonicRegressionModel
GroupedData-class S4 class that represents a GroupedData
LogisticRegressionModel-class S4 class that represents an LogisticRegressionModel
NaiveBayesModel-class S4 class that represents a NaiveBayesModel
LinearSVCModel-class S4 class that represents an LinearSVCModel
broadcast broadcast
GeneralizedLinearRegressionModel-class S4 class that represents a generalized linear model Download data from a SparkDataFrame into a R data.frame
LDAModel-class S4 class that represents an LDAModel
arrange Arrange Rows by Variables
MultilayerPerceptronClassificationModel-class S4 class that represents a MultilayerPerceptronClassificationModel
KMeansModel-class S4 class that represents a KMeansModel
cache Cache
KSTest-class S4 class that represents an KSTest
cast Casts the column to a different data type.
checkpoint checkpoint
approxQuantile Calculates the approximate quantiles of numerical columns of a SparkDataFrame
avg avg
awaitTermination awaitTermination
attach,SparkDataFrame-method Attach SparkDataFrame to R search path
RandomForestRegressionModel-class S4 class that represents a RandomForestRegressionModel
RandomForestClassificationModel-class S4 class that represents a RandomForestClassificationModel
clearCache Clear Cache
StreamingQuery-class S4 class that represents a StreamingQuery
SparkDataFrame-class S4 class that represents a SparkDataFrame
column_datetime_diff_functions Date time arithmetic functions for Column operations
cancelJobGroup Cancel active jobs for the specified group
collect Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
cacheTable Cache Table
coalesce Coalesce
clearJobGroup Clear current job group ID and its description
column_aggregate_functions Aggregate functions for Column operations
between between
coltypes coltypes
WindowSpec-class S4 class that represents a WindowSpec
crosstab Computes a pair-wise frequency table of the given columns
createDataFrame Create a SparkDataFrame
column_datetime_functions Date time functions for Column operations
createExternalTable (Deprecated) Create an external table
column_collection_functions Collection functions for Column operations
crossJoin CrossJoin
column_math_functions Math functions for Column operations
column S4 class that represents a SparkDataFrame column
colnames Column Names of SparkDataFrame
column_nonaggregate_functions Non-aggregate functions for Column operations
column_string_functions String functions for Column operations
corr corr
exceptAll exceptAll
freqItems Finding frequent items for columns, possibly with false positives
gapply gapply
except except
column_window_functions Window functions for Column operations
describe describe
dim Returns the dimensions of SparkDataFrame
asc A set of operations working with SparkDataFrame columns
group_by GroupBy
dapply dapply
dapplyCollect dapplyCollect
dropTempView Drops the temporary view with the given view name in the catalog.
dtypes DataTypes
cube cube
createOrReplaceTempView Creates a temporary view using the given name.
createTable Creates a table based on the dataset in a data source
currentDatabase Returns the current default database
column_misc_functions Miscellaneous functions for Column operations
hashCode Compute the hashCode of an object
isLocal isLocal
isStreaming isStreaming
cov cov
count Count
nrow Returns the number of rows in a SparkDataFrame
not !
%<=>% %<=>%
endsWith endsWith
gapplyCollect gapplyCollect
dropTempTable (Deprecated) Drop Temporary Table
dropDuplicates dropDuplicates
getLocalProperty Get a local property set in this thread, or NULL if it is missing. See setLocalProperty.
getNumPartitions getNumPartitions
glm,formula,ANY,SparkDataFrame-method Generalized Linear Models (R-compliant)
insertInto insertInto
histogram Compute histogram statistics for given column
distinct Distinct
drop drop
head Head
hint hint
explain Explain
persist Persist
filter Filter
listColumns Returns a list of columns for the given table/view in the specified database
fitted Get fitted result from a k-means model
first Return the first row of a SparkDataFrame
merge Merges two data frames
mutate Mutate
read.orc Create a SparkDataFrame from an ORC file.
pivot Pivot a column of the GroupedData and perform the specified aggregation. Load a fitted MLlib model from the input path.
join Join
listDatabases Returns a list of databases available
listFunctions Returns a list of functions registered in the specified database
install.spark Download and Install Apache Spark to a Local Directory
intersect Intersect
listTables Returns a list of tables or views in the specified database
repartition Repartition
repartitionByRange Repartition by range
isActive isActive
intersectAll intersectAll
read.text Create a SparkDataFrame from a text file.
recoverPartitions Recovers all the partitions in the directory of a table and update the catalog
lastProgress lastProgress
last last
limit Limit
setCheckpointDir Set checkpoint directory
rbind Union two or more SparkDataFrames
over over
read.df Load a SparkDataFrame
partitionBy partitionBy
refreshByPath Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path
setCurrentDatabase Sets the current default database
localCheckpoint localCheckpoint
predict Makes predictions from a MLlib model
print.jobj Print a JVM object reference.
%in% Match a column with given values.
read.jdbc Create a SparkDataFrame representing the database table accessible via JDBC URL
read.json Create a SparkDataFrame from a JSON file.
spark.als Alternating Least Squares (ALS) for Collaborative Filtering
spark.addFile Add a file or directory to be downloaded with this Spark job on every node.
dropna A set of SparkDataFrame functions working with NA values
rollup rollup
ncol Returns the number of columns in a SparkDataFrame
rowsBetween rowsBetween
spark.bisectingKmeans Bisecting K-Means Clustering Model
refreshTable Invalidates and refreshes all the cached data and metadata of the given table
print.structType Print a Spark StructType.
randomSplit randomSplit
print.structField Print a Spark StructField.
orderBy Ordering Columns in a WindowSpec
sample Sample
rangeBetween rangeBetween
spark.kstest (One-Sample) Kolmogorov-Smirnov Test
setJobGroup Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
setJobDescription Set a human readable description of the current job.
spark.lapply Run a function over a list of elements, distributing the computations with Spark
show show
sampleBy Returns a stratified sample without replacement
otherwise otherwise
saveAsTable Save the contents of the SparkDataFrame to a data source as a table
spark.gbt Gradient Boosted Tree Model for Regression and Classification
sparkR.init (Deprecated) Initialize a new Spark Context
spark.getSparkFiles Get the absolute path of a file added through spark.addFile.
spark.decisionTree Decision Tree Model for Regression and Classification
spark.svmLinear Linear SVM Model
sparkR.callJMethod Call Java Methods
sparkR.newJObject Create Java Objects
printSchema Print Schema of a SparkDataFrame
schema Get schema object
showDF showDF
spark.lda Latent Dirichlet Allocation
spark.logit Logistic Regression Model
sparkRHive.init (Deprecated) Initialize a new HiveContext
sparkRSQL.init (Deprecated) Initialize a new SQLContext
setLocalProperty Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
queryName queryName
read.parquet Create a SparkDataFrame from a Parquet file. Load a streaming SparkDataFrame
setLogLevel Set new log level
status status
registerTempTable (Deprecated) Register Temporary Table
spark.naiveBayes Naive Bayes Models
spark.mlp Multilayer Perceptron Classification Model
stopQuery stopQuery
substr substr
subset Subset
sql SQL Query
structField structField
startsWith startsWith
structType structType
sparkR.version Get version of Spark on which this application is running
spark.kmeans K-Means Clustering Model
sparkR.uiWebUrl Get the URL of the SparkUI instance for the current active SparkSession
spark.isoreg Isotonic Regression Model
spark.randomForest Random Forest Model for Regression and Classification Write the streaming SparkDataFrame to a data source.
write.text Save the content of SparkDataFrame in a text file at the specified path.
tables Tables
rename rename
spark.survreg Accelerated Failure Time (AFT) Survival Regression Model
union Return a new SparkDataFrame containing the union of rows
storageLevel StorageLevel
unionByName Return a new SparkDataFrame containing the union of rows, matched by column names
str Compactly display the structure of a dataset
take Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
write.orc Save the contents of SparkDataFrame as an ORC file, preserving the schema.
select Select
selectExpr SelectExpr
write.parquet Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
windowPartitionBy windowPartitionBy
spark.fpGrowth FP-growth
with Evaluate a R expression in an environment constructed from a SparkDataFrame
spark.gaussianMixture Multivariate Gaussian Mixture Model (GMM)
write.df Save the contents of SparkDataFrame to a data source.
write.jdbc Save the content of SparkDataFrame to an external database table via JDBC.
spark.getSparkFilesRootDirectory Get the root directory that contains files added through spark.addFile.
spark.glm Generalized Linear Models
withWatermark withWatermark
withColumn WithColumn
sparkR.callJStatic Call Static Java Methods
tableNames Table Names
sparkR.conf Get Runtime Config from the current active SparkSession
summary summary
agg summarize
sparkR.session Get the existing SparkSession or initialize a new SparkSession.
uncacheTable Uncache Table
sparkR.session.stop Stop the Spark Session and Spark Context
tableToDF Create a SparkDataFrame from a SparkSQL table or view
unpersist Unpersist
windowOrderBy windowOrderBy
write.json Save the contents of SparkDataFrame as a JSON file Saves the MLlib model to the input path
GBTRegressionModel-class S4 class that represents a GBTRegressionModel
FPGrowthModel-class S4 class that represents a FPGrowthModel
No Results!

Vignettes of SparkR

No Results!

Last month downloads


Type Package
License Apache License (== 2.0)
SystemRequirements Java (== 8)
Collate 'schema.R' 'generics.R' 'jobj.R' 'column.R' 'group.R' 'RDD.R' 'pairRDD.R' 'DataFrame.R' 'SQLContext.R' 'WindowSpec.R' 'backend.R' 'broadcast.R' 'catalog.R' 'client.R' 'context.R' 'deserialize.R' 'functions.R' 'install.R' 'jvm.R' 'mllib_classification.R' 'mllib_clustering.R' 'mllib_fpm.R' 'mllib_recommendation.R' 'mllib_regression.R' 'mllib_stat.R' 'mllib_tree.R' 'mllib_utils.R' 'serialize.R' 'sparkR.R' 'stats.R' 'streaming.R' 'types.R' 'utils.R' 'window.R'
RoxygenNote 7.1.0
VignetteBuilder knitr
NeedsCompilation no
Encoding UTF-8
Packaged 2020-05-30 00:18:43 UTC; spark-rm
Repository CRAN
Date/Publication 2020-06-06 04:20:06 UTC

Include our badge in your README