Learn R Programming

lessR (version 4.5)

Correlation: Correlation Analysis

Description

Abbreviation: cr, cr_brief

For two variables, yields the correlation coefficient with hypothesis test and confidence interval. For a data frame or subset of variables from a data frame, yields the correlation matrix. The default computed coefficient(s) are the standard Pearson's product-moment correlation, with Spearman and Kendall coefficients available. For the default missing data technique of pairwise deletion, an analysis of missing data for each computed correlation coefficient is provided. For a correlation matrix, a statistical summary of the missing data across all cells is provided.

Usage

Correlation(x, y, data=d,
         miss=c("pairwise", "listwise", "everything"),
         show=c("cor", "missing"),
         fill_low=NULL, fill_hi=NULL,
         brief=FALSE, digits_d=NULL, heat_map=TRUE,
         main=NULL, bottom=3, right=3, quiet=getOption("quiet"),
         pdf_file=NULL, width=5, height=5, ...)

cr_brief(..., brief=TRUE)

cr(...)

Value

From versions of lessR of 3.3 and earlier, if a correlation matrix is computed, the matrix is returned. Now more values are returned, so the matrix is embedded in a list of returned elements.

READABLE OUTPUT

single coefficient

r: Estimated correlation coefficient

tvalue: t statistic for testing \(H_0:r=0\)

df: Degrees of freedom for the t test

pvalue: P-value for the t test

lb: Lower bound of the confidence interval for r

ub: Upper bound of the confidence interval for r

n: Number of non-missing paired observations

cov: Sample covariance

matrix

out_cor: Correlations or out_missing: Missing values analysis

STATISTICS

single coefficient

r: Model formula that specifies the model

tvalue: t-statistic of estimated value of null hypothesis of no relationship

df: Degrees of freedom of hypothesis test pvalue: Number of rows of data submitted for analysis

lb: Lower bound of confidence interval

ub: Upper bound of confidence interval

matrix

R: Correlations

Arguments

x

First variable, or list of variables for a correlation matrix.

y

Second variable or not specified if the first argument is a list.

data

Optional data frame that contains the variables of interest, default is d.

miss

Basis for deleting missing data values.

show

Default is to compute and show correlations, or specify to compute and show missing data by setting to "missing".

fill_low

Starting color for a custom sequential palette.

fill_hi

Ending color for a custom sequential palette.

brief

Pertains to a single correlation coefficient analysis. If FALSE, then the sample covariance and number of non-missing and missing observations are displayed.

digits_d

Specifies the number of decimal digits to display in the output.

heat_map

If TRUE, generate a heat map.

main

Graph title of heat map. Set to main="" to turn off.

bottom

Number of lines in the bottom margin of heat map.

right

Number of lines in the right margin of heat map.

quiet

If set to TRUE, no text output. Can change system default with style function.

pdf_file

Indicate to direct pdf graphics to the specified name of the pdf file.

width

Width of the pdf file in inches.

height

Height of the pdf file in inches.

...

Additional arguments passed to cor and cor.test, e.g., method="spearman" or method="kendall", and alternative="less" or alternative="greater".

Author

David W. Gerbing (Portland State University; gerbing@pdx.edu)

Details

When two variables are specified, both x and y, the output is the correlation coefficient with hypothesis test, for a null hypothesis of 0, and confidence interval. Also displays the sample covariance. Based on R functions cor, cor.test, cov.

In place of two variables x and y, x can be a complete data frame, either specified with the name of a data frame, or blank to rely upon the default data frame d. Or, x can be a list of variables from the input data frame. In these situations y is missing. Any non-numeric variables in the data frame or specified variable list are automatically deleted from the analysis.

When heat_map=TRUE, generate a heat map to standard graphics device. Set pdf_file to generate these graphics but have them directed to their respective pdf files.

For treating missing data, the default is pairwise, which means that an observation is deleted only for the computation of a specific correlation coefficient if one or both variables are missing the value for the relevant variable(s). For listwise deletion, the entire observation is deleted from the analysis if any of its data values are missing. For the more extreme everything option, any missing data values for a variable result in all correlations for that variable reported as missing.

References

Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapter 10, NY: Routledge.

See Also

Examples

Run this code
# data
n <- 12
f <- sample(c("Group1","Group2"), size=n, replace=TRUE)
x1 <- round(rnorm(n=n, mean=50, sd=10), 2)
x2 <- round(rnorm(n=n, mean=50, sd=10), 2)
x3 <- round(rnorm(n=n, mean=50, sd=10), 2)
x4 <- round(rnorm(n=n, mean=50, sd=10), 2)
d <- data.frame(f,x1, x2, x3, x4)
rm(f); rm(x1); rm(x2); rm(x3); rm(x4)

# correlation and covariance
Correlation(x1, x2)
# short name
cr(x1, x2)
# brief form of output
cr_brief(x1, x2)

# Spearman rank correlation, one-sided test
Correlation(x1, x2, method="spearman", alternative="less")

# correlation matrix of the numerical variables in d assigned to R
R <- Correlation()

# correlation matrix of Kendall's tau coefficients
R <- cr(method="kendall")

# analysis with data not from data frame R
data(attitude)
R <- Correlation(rating, learning, data=attitude)

# analysis of entire data frame that is not R
data(attitude)
R <- Correlation(attitude)

Run the code above in your browser using DataLab