Rdocumentation
powered by
Learn R Programming
⚠️
There's a newer version (3.1.2) of this package.
Take me there.
SparkR (version 2.1.2)
R Frontend for Apache Spark
Description
Provides an R Frontend for Apache Spark.
Copy Link
Link to current version
Version
Version
3.1.2
2.4.6
2.4.5
2.4.4
2.4.3
2.4.2
2.4.1
2.3.0
2.1.2
Install
install.packages('SparkR')
Monthly Downloads
155
Version
2.1.2
License
Apache License (== 2.0)
Maintainer
Shivaram Venkataraman
Last Published
October 12th, 2017
Functions in SparkR (2.1.2)
Search all functions
AFTSurvivalRegressionModel-class
S4 class that represents a AFTSurvivalRegressionModel
ALSModel-class
S4 class that represents an ALSModel
GBTClassificationModel-class
S4 class that represents a GBTClassificationModel
GBTRegressionModel-class
S4 class that represents a GBTRegressionModel
KMeansModel-class
S4 class that represents a KMeansModel
KSTest-class
S4 class that represents an KSTest
GroupedData-class
S4 class that represents a GroupedData
IsotonicRegressionModel-class
S4 class that represents an IsotonicRegressionModel
GaussianMixtureModel-class
S4 class that represents a GaussianMixtureModel
GeneralizedLinearRegressionModel-class
S4 class that represents a generalized linear model
MultilayerPerceptronClassificationModel-class
S4 class that represents a MultilayerPerceptronClassificationModel
NaiveBayesModel-class
S4 class that represents a NaiveBayesModel
LDAModel-class
S4 class that represents an LDAModel
LogisticRegressionModel-class
S4 class that represents an LogisticRegressionModel
asin
asin
abs
abs
acos
acos
as.data.frame
Download data from a SparkDataFrame into a R data.frame
atan
atan
between
between
bin
bin
cancelJobGroup
Cancel active jobs for the specified group
arrange
Arrange Rows by Variables
array_contains
array_contains
avg
avg
ascii
ascii
bitwiseNOT
bitwiseNOT
bround
bround
coltypes
coltypes
RandomForestClassificationModel-class
S4 class that represents a RandomForestClassificationModel
RandomForestRegressionModel-class
S4 class that represents a RandomForestRegressionModel
add_months
add_months
alias
alias
base64
base64
coalesce
Coalesce
collect
Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
cast
Casts the column to a different data type.
conv
conv
corr
corr
crossJoin
CrossJoin
column
S4 class that represents a SparkDataFrame column
asc
A set of operations working with SparkDataFrame columns
colnames
Column Names of SparkDataFrame
cume_dist
cume_dist
crosstab
Computes a pair-wise frequency table of the given columns
dayofyear
dayofyear
decode
decode
dropDuplicates
dropDuplicates
dropTempTable
(Deprecated) Drop Temporary Table
factorial
factorial
filter
Filter
generateAliasesForIntersectedCols
Creates a list of columns by replacing the intersected ones with aliases
getNumPartitions
getNumPartitions
dapply
dapply
dapplyCollect
dapplyCollect
date_add
date_add
distinct
Distinct
drop
drop
cache
Cache
cacheTable
Cache Table
clearCache
Clear Cache
clearJobGroup
Clear current job group ID and its description
SparkDataFrame-class
S4 class that represents a SparkDataFrame
WindowSpec-class
S4 class that represents a WindowSpec
approxCountDistinct
Returns the approximate number of distinct items in a group
approxQuantile
Calculates the approximate quantiles of a numerical column of a SparkDataFrame
atan2
atan2
attach
Attach SparkDataFrame to R search path
cbrt
cbrt
ceil
Computes the ceiling of the given value
count
Count
hour
hour
hypot
hypot
insertInto
insertInto
install.spark
Download and Install Apache Spark to a Local Directory
floor
floor
format_number
format_number
glm
Generalized Linear Models (R-compliant)
greatest
greatest
group_by
GroupBy
cos
cos
cosh
cosh
createExternalTable
Create an external table
createOrReplaceTempView
Creates a temporary view using the given name.
countDistinct
Count Distinct Values
cov
cov
covar_pop
covar_pop
dense_rank
dense_rank
hash
hash
instr
instr
least
least
length
length
mean
mean
merge
Merges two data frames
dropna
A set of SparkDataFrame functions working with NA values
nanvl
nanvl
concat
concat
concat_ws
concat_ws
crc32
crc32
intersect
Intersect
otherwise
otherwise
over
over
print.structField
Print a Spark StructField.
print.structType
Print a Spark StructType.
randomSplit
randomSplit
datediff
datediff
dayofmonth
dayofmonth
except
except
exp
exp
expm1
expm1
createDataFrame
Create a SparkDataFrame
date_format
date_format
date_sub
date_sub
dropTempView
Drops the temporary view with the given view name in the catalog.
dtypes
DataTypes
rangeBetween
rangeBetween
rename
rename
repartition
Repartition
second
second
select
Select
selectExpr
SelectExpr
setJobGroup
Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
dim
Returns the dimensions of SparkDataFrame
encode
encode
endsWith
endsWith
first
Return the first row of a SparkDataFrame
fitted
Get fitted result from a k-means model
soundex
soundex
spark.addFile
Add a file or directory to be downloaded with this Spark job on every node.
spark.gbt
Gradient Boosted Tree Model for Regression and Classification
spark.getSparkFiles
Get the absolute path of a file added through spark.addFile.
spark.mlp
Multilayer Perceptron Classification Model
expr
expr
gapply
gapply
gapplyCollect
gapplyCollect
hashCode
Compute the hashCode of an object
head
Head
last_day
last_day
lead
lead
ltrim
ltrim
%in%
Match a column with given values.
format_string
format_string
freqItems
Finding frequent items for columns, possibly with false positives
ifelse
ifelse
initcap
initcap
isnan
is.nan
isLocal
isLocal
log
log
spark.naiveBayes
Naive Bayes Models
sparkRSQL.init
(Deprecated) Initialize a new SQLContext
spark_partition_id
Return the partition ID as a column
stddev_samp
stddev_samp
storageLevel
StorageLevel
log10
log10
log1p
log1p
log2
log2
months_between
months_between
min
min
minute
minute
persist
Persist
explain
Explain
explode
explode
from_unixtime
from_unixtime
from_utc_timestamp
from_utc_timestamp
hex
hex
histogram
Compute histogram statistics for given column
join
Join
kurtosis
kurtosis
levenshtein
levenshtein
mutate
Mutate
next_day
next_day
nrow
Returns the number of rows in a SparkDataFrame
limit
Limit
lower
lower
lpad
lpad
monotonically_increasing_id
monotonically_increasing_id
pivot
Pivot a column of the GroupedData and perform the specified aggregation.
rand
rand
randn
randn
printSchema
Print Schema of a SparkDataFrame
quarter
quarter
rank
rank
rbind
Union two or more SparkDataFrames
reverse
reverse
read.orc
Create a SparkDataFrame from an ORC file.
read.parquet
Create a SparkDataFrame from a Parquet file.
round
round
row_number
row_number
rowsBetween
rowsBetween
rint
rint
sample
Sample
show
show
showDF
showDF
skewness
skewness
sort_array
sort_array
spark.isoreg
Isotonic Regression Model
spark.kmeans
K-Means Clustering Model
spark.randomForest
Random Forest Model for Regression and Classification
rtrim
rtrim
rpad
rpad
sha2
sha2
shiftLeft
shiftLeft
sinh
sinh
lag
lag
last
last
lit
lit
locate
locate
spark.survreg
Accelerated Failure Time (AFT) Survival Regression Model
sparkR.session.stop
Stop the Spark Session and Spark Context
sparkR.uiWebUrl
Get the URL of the SparkUI instance for the current active SparkSession
str
Compactly display the structure of a dataset
struct
struct
size
size
spark.lda
Latent Dirichlet Allocation
spark.logit
Logistic Regression Model
sparkR.callJMethod
Call Java Methods
sparkR.callJStatic
Call Static Java Methods
startsWith
startsWith
stddev_pop
stddev_pop
substring_index
substring_index
sum
sum
tableToDF
Create a SparkDataFrame from a SparkSQL Table
tables
Tables
toRadians
toRadians
describe
summary
tableNames
Table Names
to_utc_timestamp
to_utc_timestamp
translate
translate
month
month
ntile
ntile
orderBy
Ordering Columns in a WindowSpec
pmod
pmod
posexplode
posexplode
read.df
Load a SparkDataFrame
take
Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
tan
tan
trim
trim
unbase64
unbase64
var
var
read.jdbc
Create a SparkDataFrame representing the database table accessible via JDBC URL
regexp_replace
regexp_replace
registerTempTable
(Deprecated) Register Temporary Table
sampleBy
Returns a stratified sample without replacement
max
max
md5
md5
ncol
Returns the number of columns in a SparkDataFrame
negate
negate
partitionBy
partitionBy
saveAsTable
Save the contents of the SparkDataFrame to a data source as a table
shiftRight
shiftRight
shiftRightUnsigned
shiftRightUnsigned
var_pop
var_pop
write.orc
Save the contents of SparkDataFrame as an ORC file, preserving the schema.
write.parquet
Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
percent_rank
percent_rank
predict
Makes predictions from a MLlib model
print.jobj
Print a JVM object reference.
spark.als
Alternating Least Squares (ALS) for Collaborative Filtering
spark.gaussianMixture
Multivariate Gaussian Mixture Model (GMM)
spark.getSparkFilesRootDirectory
Get the root directory that contains files added through spark.addFile.
spark.glm
Generalized Linear Models
sparkR.conf
Get Runtime Config from the current active SparkSession
sparkR.init
(Deprecated) Initialize a new Spark Context
sparkR.version
Get version of Spark on which this application is running
sparkRHive.init
(Deprecated) Initialize a new HiveContext
subset
Subset
substr
substr
sumDistinct
sumDistinct
var_samp
var_samp
weekofyear
weekofyear
write.df
Save the contents of SparkDataFrame to a data source.
write.jdbc
Save the content of SparkDataFrame to an external database table via JDBC.
agg
summarize
uncacheTable
Uncache Table
unhex
unhex
windowOrderBy
windowOrderBy
read.json
Create a SparkDataFrame from a JSON file.
read.ml
Load a fitted MLlib model from the input path.
read.text
Create a SparkDataFrame from a text file.
regexp_extract
regexp_extract
schema
Get schema object
windowPartitionBy
windowPartitionBy
write.text
Save the content of SparkDataFrame in a text file at the specified path.
year
year
sd
sd
setLogLevel
Set new log level
sha1
sha1
signum
signum
spark.kstest
(One-Sample) Kolmogorov-Smirnov Test
spark.lapply
Run a function over a list of elements, distributing the computations with Spark
sparkR.newJObject
Create Java Objects
sparkR.session
Get the existing SparkSession or initialize a new SparkSession.
sql
SQL Query
sin
sin
sqrt
sqrt
structField
structField
structType
structType
tanh
tanh
toDegrees
toDegrees
to_date
to_date
when
when
window
window
with
Evaluate a R expression in an environment constructed from a SparkDataFrame
withColumn
WithColumn
union
Return a new SparkDataFrame containing the union of rows
unix_timestamp
unix_timestamp
unpersist
Unpersist
upper
upper
write.json
Save the contents of SparkDataFrame as a JSON file
write.ml
Saves the MLlib model to the input path