Learn R Programming

⚠️There's a newer version (3.1.2) of this package.Take me there.

SparkR (version 2.4.5)

R Front End for 'Apache Spark'

Description

Provides an R Front end for 'Apache Spark' .

Copy Link

Version

Install

install.packages('SparkR')

Monthly Downloads

104

Version

2.4.5

License

Apache License (== 2.0)

Maintainer

Shivaram Venkataraman

Last Published

February 7th, 2020

Functions in SparkR (2.4.5)

DecisionTreeClassificationModel-class

S4 class that represents a DecisionTreeClassificationModel
GBTRegressionModel-class

S4 class that represents a GBTRegressionModel
FPGrowthModel-class

S4 class that represents a FPGrowthModel
BisectingKMeansModel-class

S4 class that represents a BisectingKMeansModel
ALSModel-class

S4 class that represents an ALSModel
GBTClassificationModel-class

S4 class that represents a GBTClassificationModel
GaussianMixtureModel-class

S4 class that represents a GaussianMixtureModel
AFTSurvivalRegressionModel-class

S4 class that represents a AFTSurvivalRegressionModel
DecisionTreeRegressionModel-class

S4 class that represents a DecisionTreeRegressionModel
KMeansModel-class

S4 class that represents a KMeansModel
KSTest-class

S4 class that represents an KSTest
RandomForestRegressionModel-class

S4 class that represents a RandomForestRegressionModel
RandomForestClassificationModel-class

S4 class that represents a RandomForestClassificationModel
NaiveBayesModel-class

S4 class that represents a NaiveBayesModel
clearCache

Clear Cache
clearJobGroup

Clear current job group ID and its description
WindowSpec-class

S4 class that represents a WindowSpec
StreamingQuery-class

S4 class that represents a StreamingQuery
LDAModel-class

S4 class that represents an LDAModel
GeneralizedLinearRegressionModel-class

S4 class that represents a generalized linear model
LinearSVCModel-class

S4 class that represents an LinearSVCModel
LogisticRegressionModel-class

S4 class that represents an LogisticRegressionModel
arrange

Arrange Rows by Variables
as.data.frame

Download data from a SparkDataFrame into a R data.frame
MultilayerPerceptronClassificationModel-class

S4 class that represents a MultilayerPerceptronClassificationModel
GroupedData-class

S4 class that represents a GroupedData
awaitTermination

awaitTermination
between

between
coltypes

coltypes
collect

Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
column_nonaggregate_functions

Non-aggregate functions for Column operations
coalesce

Coalesce
column_datetime_diff_functions

Date time arithmetic functions for Column operations
cast

Casts the column to a different data type.
SparkDataFrame-class

S4 class that represents a SparkDataFrame
column_datetime_functions

Date time functions for Column operations
checkpoint

checkpoint
column

S4 class that represents a SparkDataFrame column
column_string_functions

String functions for Column operations
colnames

Column Names of SparkDataFrame
column_window_functions

Window functions for Column operations
corr

corr
asc

A set of operations working with SparkDataFrame columns
glm,formula,ANY,SparkDataFrame-method

Generalized Linear Models (R-compliant)
crosstab

Computes a pair-wise frequency table of the given columns
drop

drop
crossJoin

CrossJoin
distinct

Distinct
getNumPartitions

getNumPartitions
avg

avg
IsotonicRegressionModel-class

S4 class that represents an IsotonicRegressionModel
attach,SparkDataFrame-method

Attach SparkDataFrame to R search path
dapply

dapply
intersectAll

intersectAll
isActive

isActive
mutate

Mutate
join

Join
print.structType

Print a Spark StructType.
merge

Merges two data frames
last

last
print.structField

Print a Spark StructField.
cacheTable

Cache Table
currentDatabase

Returns the current default database
cube

cube
randomSplit

randomSplit
alias

alias
explain

Explain
dropDuplicates

dropDuplicates
install.spark

Download and Install Apache Spark to a Local Directory
dropTempTable

(Deprecated) Drop Temporary Table
dapplyCollect

dapplyCollect
intersect

Intersect
filter

Filter
cancelJobGroup

Cancel active jobs for the specified group
lastProgress

lastProgress
column_aggregate_functions

Aggregate functions for Column operations
rangeBetween

rangeBetween
broadcast

broadcast
approxQuantile

Calculates the approximate quantiles of numerical columns of a SparkDataFrame
dtypes

DataTypes
dropTempView

Drops the temporary view with the given view name in the catalog.
gapplyCollect

gapplyCollect
group_by

GroupBy
getLocalProperty

Get a local property set in this thread, or NULL if it is missing. See setLocalProperty.
cache

Cache
limit

Limit
not

!
nrow

Returns the number of rows in a SparkDataFrame
repartition

Repartition
column_math_functions

Math functions for Column operations
count

Count
column_misc_functions

Miscellaneous functions for Column operations
createOrReplaceTempView

Creates a temporary view using the given name.
cov

cov
column_collection_functions

Collection functions for Column operations
partitionBy

partitionBy
over

over
createDataFrame

Create a SparkDataFrame
setCheckpointDir

Set checkpoint directory
repartitionByRange

Repartition by range
createExternalTable

(Deprecated) Create an external table
dim

Returns the dimensions of SparkDataFrame
describe

describe
endsWith

endsWith
createTable

Creates a table based on the dataset in a data source
first

Return the first row of a SparkDataFrame
%<=>%

%<=>%
rollup

rollup
read.orc

Create a SparkDataFrame from an ORC file.
read.ml

Load a fitted MLlib model from the input path.
except

except
fitted

Get fitted result from a k-means model
saveAsTable

Save the contents of the SparkDataFrame to a data source as a table
schema

Get schema object
rowsBetween

rowsBetween
histogram

Compute histogram statistics for given column
insertInto

insertInto
exceptAll

exceptAll
freqItems

Finding frequent items for columns, possibly with false positives
gapply

gapply
setLocalProperty

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
setCurrentDatabase

Sets the current default database
head

Head
listColumns

Returns a list of columns for the given table/view in the specified database
hint

hint
listDatabases

Returns a list of databases available
hashCode

Compute the hashCode of an object
setLogLevel

Set new log level
spark.kstest

(One-Sample) Kolmogorov-Smirnov Test
listFunctions

Returns a list of functions registered in the specified database
read.jdbc

Create a SparkDataFrame representing the database table accessible via JDBC URL
%in%

Match a column with given values.
refreshByPath

Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path
localCheckpoint

localCheckpoint
print.jobj

Print a JVM object reference.
read.json

Create a SparkDataFrame from a JSON file.
predict

Makes predictions from a MLlib model
listTables

Returns a list of tables or views in the specified database
dropna

A set of SparkDataFrame functions working with NA values
refreshTable

Invalidates and refreshes all the cached data and metadata of the given table
setJobGroup

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
showDF

showDF
spark.isoreg

Isotonic Regression Model
setJobDescription

Set a human readable description of the current job.
show

show
isLocal

isLocal
ncol

Returns the number of columns in a SparkDataFrame
spark.kmeans

K-Means Clustering Model
persist

Persist
orderBy

Ordering Columns in a WindowSpec
isStreaming

isStreaming
rbind

Union two or more SparkDataFrames
pivot

Pivot a column of the GroupedData and perform the specified aggregation.
read.df

Load a SparkDataFrame
otherwise

otherwise
spark.lapply

Run a function over a list of elements, distributing the computations with Spark
spark.svmLinear

Linear SVM Model
queryName

queryName
printSchema

Print Schema of a SparkDataFrame
registerTempTable

(Deprecated) Register Temporary Table
read.parquet

Create a SparkDataFrame from a Parquet file.
read.stream

Load a streaming SparkDataFrame
sparkR.init

(Deprecated) Initialize a new Spark Context
sparkR.callJMethod

Call Java Methods
sparkR.uiWebUrl

Get the URL of the SparkUI instance for the current active SparkSession
select

Select
rename

rename
read.text

Create a SparkDataFrame from a text file.
selectExpr

SelectExpr
sampleBy

Returns a stratified sample without replacement
spark.bisectingKmeans

Bisecting K-Means Clustering Model
recoverPartitions

Recovers all the partitions in the directory of a table and update the catalog
sample

Sample
sparkR.version

Get version of Spark on which this application is running
spark.decisionTree

Decision Tree Model for Regression and Classification
spark.addFile

Add a file or directory to be downloaded with this Spark job on every node.
spark.lda

Latent Dirichlet Allocation
spark.als

Alternating Least Squares (ALS) for Collaborative Filtering
spark.logit

Logistic Regression Model
substr

substr
with

Evaluate a R expression in an environment constructed from a SparkDataFrame
subset

Subset
windowPartitionBy

windowPartitionBy
sparkR.newJObject

Create Java Objects
withColumn

WithColumn
sparkRHive.init

(Deprecated) Initialize a new HiveContext
spark.glm

Generalized Linear Models
spark.getSparkFilesRootDirectory

Get the root directory that contains files added through spark.addFile.
spark.mlp

Multilayer Perceptron Classification Model
spark.naiveBayes

Naive Bayes Models
spark.randomForest

Random Forest Model for Regression and Classification
spark.survreg

Accelerated Failure Time (AFT) Survival Regression Model
stopQuery

stopQuery
status

status
sparkR.session

Get the existing SparkSession or initialize a new SparkSession.
sparkRSQL.init

(Deprecated) Initialize a new SQLContext
spark.fpGrowth

FP-growth
sparkR.session.stop

Stop the Spark Session and Spark Context
tableNames

Table Names
storageLevel

StorageLevel
str

Compactly display the structure of a dataset
unionByName

Return a new SparkDataFrame containing the union of rows, matched by column names
write.ml

Saves the MLlib model to the input path
write.json

Save the contents of SparkDataFrame as a JSON file
union

Return a new SparkDataFrame containing the union of rows
withWatermark

withWatermark
structField

structField
write.orc

Save the contents of SparkDataFrame as an ORC file, preserving the schema.
structType

structType
take

Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
tables

Tables
write.parquet

Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
tableToDF

Create a SparkDataFrame from a SparkSQL table or view
spark.gaussianMixture

Multivariate Gaussian Mixture Model (GMM)
windowOrderBy

windowOrderBy
unpersist

Unpersist
spark.gbt

Gradient Boosted Tree Model for Regression and Classification
write.jdbc

Save the content of SparkDataFrame to an external database table via JDBC.
write.df

Save the contents of SparkDataFrame to a data source.
sparkR.callJStatic

Call Static Java Methods
spark.getSparkFiles

Get the absolute path of a file added through spark.addFile.
sql

SQL Query
sparkR.conf

Get Runtime Config from the current active SparkSession
startsWith

startsWith
agg

summarize
summary

summary
toJSON

toJSON
uncacheTable

Uncache Table
write.text

Save the content of SparkDataFrame in a text file at the specified path.
write.stream

Write the streaming SparkDataFrame to a data source.