SparkR (version 2.1.2)

corr: corr

Description

Computes the Pearson Correlation Coefficient for two Columns.

Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.

Usage

corr(x, ...)

# S4 method for Column corr(x, col2)

# S4 method for SparkDataFrame corr(x, colName1, colName2, method = "pearson")

Arguments

x

a Column or a SparkDataFrame.

...

additional argument(s). If x is a Column, a Column should be provided. If x is a SparkDataFrame, two column names should be provided.

col2

a (second) Column.

colName1

the name of the first column

colName2

the name of the second column

method

Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now.

Value

The Pearson Correlation Coefficient as a Double.

See Also

Other math_funcs: acos, asin, atan2, atan, bin, bround, cbrt, ceil, conv, cosh, cos, covar_pop, cov, expm1, exp, factorial, floor, hex, hypot, log10, log1p, log2, log, pmod, rint, round, shiftLeft, shiftRightUnsigned, shiftRight, signum, sinh, sin, sqrt, tanh, tan, toDegrees, toRadians, unhex

Other stat functions: approxQuantile, cov, crosstab, freqItems, sampleBy

Examples

Run this code
# NOT RUN {
corr(df$c, df$d)
# }
# NOT RUN {
df <- read.json("/path/to/file.json")
corr <- corr(df, "title", "gender")
corr <- corr(df, "title", "gender", method = "pearson")
# }

Run the code above in your browser using DataCamp Workspace