Learn R Programming

⚠️There's a newer version (3.1.2) of this package.Take me there.

SparkR (version 2.1.2)

R Frontend for Apache Spark

Description

Provides an R Frontend for Apache Spark.

Copy Link

Version

Install

install.packages('SparkR')

Monthly Downloads

155

Version

2.1.2

License

Apache License (== 2.0)

Last Published

October 12th, 2017

Functions in SparkR (2.1.2)

AFTSurvivalRegressionModel-class

S4 class that represents a AFTSurvivalRegressionModel
ALSModel-class

S4 class that represents an ALSModel
GBTClassificationModel-class

S4 class that represents a GBTClassificationModel
GBTRegressionModel-class

S4 class that represents a GBTRegressionModel
KMeansModel-class

S4 class that represents a KMeansModel
KSTest-class

S4 class that represents an KSTest
GroupedData-class

S4 class that represents a GroupedData
IsotonicRegressionModel-class

S4 class that represents an IsotonicRegressionModel
GaussianMixtureModel-class

S4 class that represents a GaussianMixtureModel
GeneralizedLinearRegressionModel-class

S4 class that represents a generalized linear model
MultilayerPerceptronClassificationModel-class

S4 class that represents a MultilayerPerceptronClassificationModel
NaiveBayesModel-class

S4 class that represents a NaiveBayesModel
LDAModel-class

S4 class that represents an LDAModel
LogisticRegressionModel-class

S4 class that represents an LogisticRegressionModel
asin

asin
abs

abs
acos

acos
as.data.frame

Download data from a SparkDataFrame into a R data.frame
atan

atan
between

between
bin

bin
cancelJobGroup

Cancel active jobs for the specified group
arrange

Arrange Rows by Variables
array_contains

array_contains
avg

avg
ascii

ascii
bitwiseNOT

bitwiseNOT
bround

bround
coltypes

coltypes
RandomForestClassificationModel-class

S4 class that represents a RandomForestClassificationModel
RandomForestRegressionModel-class

S4 class that represents a RandomForestRegressionModel
add_months

add_months
alias

alias
base64

base64
coalesce

Coalesce
collect

Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
cast

Casts the column to a different data type.
conv

conv
corr

corr
crossJoin

CrossJoin
column

S4 class that represents a SparkDataFrame column
asc

A set of operations working with SparkDataFrame columns
colnames

Column Names of SparkDataFrame
cume_dist

cume_dist
crosstab

Computes a pair-wise frequency table of the given columns
dayofyear

dayofyear
decode

decode
dropDuplicates

dropDuplicates
dropTempTable

(Deprecated) Drop Temporary Table
factorial

factorial
filter

Filter
generateAliasesForIntersectedCols

Creates a list of columns by replacing the intersected ones with aliases
getNumPartitions

getNumPartitions
dapply

dapply
dapplyCollect

dapplyCollect
date_add

date_add
distinct

Distinct
drop

drop
cache

Cache
cacheTable

Cache Table
clearCache

Clear Cache
clearJobGroup

Clear current job group ID and its description
SparkDataFrame-class

S4 class that represents a SparkDataFrame
WindowSpec-class

S4 class that represents a WindowSpec
approxCountDistinct

Returns the approximate number of distinct items in a group
approxQuantile

Calculates the approximate quantiles of a numerical column of a SparkDataFrame
atan2

atan2
attach

Attach SparkDataFrame to R search path
cbrt

cbrt
ceil

Computes the ceiling of the given value
count

Count
hour

hour
hypot

hypot
insertInto

insertInto
install.spark

Download and Install Apache Spark to a Local Directory
floor

floor
format_number

format_number
glm

Generalized Linear Models (R-compliant)
greatest

greatest
group_by

GroupBy
cos

cos
cosh

cosh
createExternalTable

Create an external table
createOrReplaceTempView

Creates a temporary view using the given name.
countDistinct

Count Distinct Values
cov

cov
covar_pop

covar_pop
dense_rank

dense_rank
hash

hash
instr

instr
least

least
length

length
mean

mean
merge

Merges two data frames
dropna

A set of SparkDataFrame functions working with NA values
nanvl

nanvl
concat

concat
concat_ws

concat_ws
crc32

crc32
intersect

Intersect
otherwise

otherwise
over

over
print.structField

Print a Spark StructField.
print.structType

Print a Spark StructType.
randomSplit

randomSplit
datediff

datediff
dayofmonth

dayofmonth
except

except
exp

exp
expm1

expm1
createDataFrame

Create a SparkDataFrame
date_format

date_format
date_sub

date_sub
dropTempView

Drops the temporary view with the given view name in the catalog.
dtypes

DataTypes
rangeBetween

rangeBetween
rename

rename
repartition

Repartition
second

second
select

Select
selectExpr

SelectExpr
setJobGroup

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
dim

Returns the dimensions of SparkDataFrame
encode

encode
endsWith

endsWith
first

Return the first row of a SparkDataFrame
fitted

Get fitted result from a k-means model
soundex

soundex
spark.addFile

Add a file or directory to be downloaded with this Spark job on every node.
spark.gbt

Gradient Boosted Tree Model for Regression and Classification
spark.getSparkFiles

Get the absolute path of a file added through spark.addFile.
spark.mlp

Multilayer Perceptron Classification Model
expr

expr
gapply

gapply
gapplyCollect

gapplyCollect
hashCode

Compute the hashCode of an object
head

Head
last_day

last_day
lead

lead
ltrim

ltrim
%in%

Match a column with given values.
format_string

format_string
freqItems

Finding frequent items for columns, possibly with false positives
ifelse

ifelse
initcap

initcap
isnan

is.nan
isLocal

isLocal
log

log
spark.naiveBayes

Naive Bayes Models
sparkRSQL.init

(Deprecated) Initialize a new SQLContext
spark_partition_id

Return the partition ID as a column
stddev_samp

stddev_samp
storageLevel

StorageLevel
log10

log10
log1p

log1p
log2

log2
months_between

months_between
min

min
minute

minute
persist

Persist
explain

Explain
explode

explode
from_unixtime

from_unixtime
from_utc_timestamp

from_utc_timestamp
hex

hex
histogram

Compute histogram statistics for given column
join

Join
kurtosis

kurtosis
levenshtein

levenshtein
mutate

Mutate
next_day

next_day
nrow

Returns the number of rows in a SparkDataFrame
limit

Limit
lower

lower
lpad

lpad
monotonically_increasing_id

monotonically_increasing_id
pivot

Pivot a column of the GroupedData and perform the specified aggregation.
rand

rand
randn

randn
printSchema

Print Schema of a SparkDataFrame
quarter

quarter
rank

rank
rbind

Union two or more SparkDataFrames
reverse

reverse
read.orc

Create a SparkDataFrame from an ORC file.
read.parquet

Create a SparkDataFrame from a Parquet file.
round

round
row_number

row_number
rowsBetween

rowsBetween
rint

rint
sample

Sample
show

show
showDF

showDF
skewness

skewness
sort_array

sort_array
spark.isoreg

Isotonic Regression Model
spark.kmeans

K-Means Clustering Model
spark.randomForest

Random Forest Model for Regression and Classification
rtrim

rtrim
rpad

rpad
sha2

sha2
shiftLeft

shiftLeft
sinh

sinh
lag

lag
last

last
lit

lit
locate

locate
spark.survreg

Accelerated Failure Time (AFT) Survival Regression Model
sparkR.session.stop

Stop the Spark Session and Spark Context
sparkR.uiWebUrl

Get the URL of the SparkUI instance for the current active SparkSession
str

Compactly display the structure of a dataset
struct

struct
size

size
spark.lda

Latent Dirichlet Allocation
spark.logit

Logistic Regression Model
sparkR.callJMethod

Call Java Methods
sparkR.callJStatic

Call Static Java Methods
startsWith

startsWith
stddev_pop

stddev_pop
substring_index

substring_index
sum

sum
tableToDF

Create a SparkDataFrame from a SparkSQL Table
tables

Tables
toRadians

toRadians
describe

summary
tableNames

Table Names
to_utc_timestamp

to_utc_timestamp
translate

translate
month

month
ntile

ntile
orderBy

Ordering Columns in a WindowSpec
pmod

pmod
posexplode

posexplode
read.df

Load a SparkDataFrame
take

Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
tan

tan
trim

trim
unbase64

unbase64
var

var
read.jdbc

Create a SparkDataFrame representing the database table accessible via JDBC URL
regexp_replace

regexp_replace
registerTempTable

(Deprecated) Register Temporary Table
sampleBy

Returns a stratified sample without replacement
max

max
md5

md5
ncol

Returns the number of columns in a SparkDataFrame
negate

negate
partitionBy

partitionBy
saveAsTable

Save the contents of the SparkDataFrame to a data source as a table
shiftRight

shiftRight
shiftRightUnsigned

shiftRightUnsigned
var_pop

var_pop
write.orc

Save the contents of SparkDataFrame as an ORC file, preserving the schema.
write.parquet

Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
percent_rank

percent_rank
predict

Makes predictions from a MLlib model
print.jobj

Print a JVM object reference.
spark.als

Alternating Least Squares (ALS) for Collaborative Filtering
spark.gaussianMixture

Multivariate Gaussian Mixture Model (GMM)
spark.getSparkFilesRootDirectory

Get the root directory that contains files added through spark.addFile.
spark.glm

Generalized Linear Models
sparkR.conf

Get Runtime Config from the current active SparkSession
sparkR.init

(Deprecated) Initialize a new Spark Context
sparkR.version

Get version of Spark on which this application is running
sparkRHive.init

(Deprecated) Initialize a new HiveContext
subset

Subset
substr

substr
sumDistinct

sumDistinct
var_samp

var_samp
weekofyear

weekofyear
write.df

Save the contents of SparkDataFrame to a data source.
write.jdbc

Save the content of SparkDataFrame to an external database table via JDBC.
agg

summarize
uncacheTable

Uncache Table
unhex

unhex
windowOrderBy

windowOrderBy
read.json

Create a SparkDataFrame from a JSON file.
read.ml

Load a fitted MLlib model from the input path.
read.text

Create a SparkDataFrame from a text file.
regexp_extract

regexp_extract
schema

Get schema object
windowPartitionBy

windowPartitionBy
write.text

Save the content of SparkDataFrame in a text file at the specified path.
year

year
sd

sd
setLogLevel

Set new log level
sha1

sha1
signum

signum
spark.kstest

(One-Sample) Kolmogorov-Smirnov Test
spark.lapply

Run a function over a list of elements, distributing the computations with Spark
sparkR.newJObject

Create Java Objects
sparkR.session

Get the existing SparkSession or initialize a new SparkSession.
sql

SQL Query
sin

sin
sqrt

sqrt
structField

structField
structType

structType
tanh

tanh
toDegrees

toDegrees
to_date

to_date
when

when
window

window
with

Evaluate a R expression in an environment constructed from a SparkDataFrame
withColumn

WithColumn
union

Return a new SparkDataFrame containing the union of rows
unix_timestamp

unix_timestamp
unpersist

Unpersist
upper

upper
write.json

Save the contents of SparkDataFrame as a JSON file
write.ml

Saves the MLlib model to the input path