Learn R Programming

kvr2 (version 0.2.0)

r2: Calculate Multiple Definitions of Coefficient of Determination (R-squared)

Description

Calculates nine types of coefficients of determination (\(R^2\)) based on the classification by Kvalseth (1985). This function is designed to demonstrate how \(R^2\) values can vary depending on their mathematical definition, particularly in models without an intercept or in power regression models

Usage

r2(model, type = c("auto", "linear", "power"), adjusted = FALSE)

r2_1(model, type = c("auto", "linear", "power"))

r2_2(model, type = c("auto", "linear", "power"))

r2_3(model, type = c("auto", "linear", "power"))

r2_4(model, type = c("auto", "linear", "power"))

r2_5(model, type = c("auto", "linear", "power"))

r2_6(model, type = c("auto", "linear", "power"))

r2_7(model, type = c("auto", "linear", "power"))

r2_8(model, type = c("auto", "linear", "power"))

r2_9(model, type = c("auto", "linear", "power"))

Value

  • For r2(): An object of class r2_kvr2, which is a list containing calculated values for all \(R^2\) formulas.

  • For individual functions (r2_1() to r2_9()): A named numeric value of the specific \(R^2\) definition. calculated values for each \(R^2\) formula.

Arguments

model

A linear model or power regression model of the lm.

type

Character string. Selects the model type: "linear", "power", or "auto" (default). In "auto", the function detects if the dependent variable is log-transformed.

adjusted

Logical. If TRUE, calculates the adjusted coefficient of determination for each formula.

Details

The nine coefficient equations from \(R^2_1\) to \(R^2_9\) are based on Kvalseth (1985) and are as follows:

  • \(R^2_1\): Proportion of total variance explained. $$R_1^2 = 1 - \frac{\sum(y - \hat{y})^2}{\sum(y - \bar{y})^2}$$

  • \(R^2_2\): Based on the variation of predicted values. $$R^2_2 = \frac{\sum(\hat{y} - \bar{y})^2}{\sum(y - \bar{y})^2}$$

  • \(R^2_3\): Ratio of variation using the mean of predicted values. $$R_3^2 = \frac{\sum(\hat{y} - \bar{\hat{y}})^2}{\sum(y - \bar{y})^2}$$

  • \(R^2_4\): Incorporates the mean residual. $$R_4^2 = 1 - \frac{\sum(e - \bar{e})^2}{\sum(y - \bar{y})^2}, \quad e = y - \hat{y}$$

  • \(R^2_5\): The square of the multiple correlation coefficient between the dependent variable and the independent variable (a comprehensive indicator in linearized models). $$R_5^2 = \text{squared multiple correlation coefficient between the regressand and the regressors}$$

  • \(R^2_6\): Square of Pearson's correlation coefficient between observed \(y\) and predicted \(\hat{y}\). $$R_6^2 = \frac{\left( \sum(y - \bar{y})(\hat{y} - \bar{\hat{y}}) \right)^2}{\sum(y - \bar{y})^2 \sum(\hat{y} - \bar{\hat{y}})^2}$$

  • \(R^2_7\): Recommended for models without an intercept. $$R_7^2 = 1 - \frac{\sum(y - \hat{y})^2}{\sum y^2}$$

  • \(R^2_8\): Alternative form for models without an intercept. $$R_8^2 = \frac{\sum \hat{y}^2}{\sum y^2}$$

  • \(R^2_9\): Robust version using medians to resist outliers. $$R_9^2 = 1 - \left( \frac{M\{|y_i - \hat{y}_i|\}}{M\{|y_i - \bar{y}|\}} \right)^2$$

where \(M\) represents the median of the sample.

For degree of freedom adjustment adjusted = TRUE, refer to r2_adjusted.

References

Tarald O. Kvalseth (1985) Cautionary Note about R 2 , The American Statistician, 39:4, 279-285, tools:::Rd_expr_doi("10.1080/00031305.1985.10479448")

Box, George E. P., Hunter, William G., Hunter, J. Stuart. (1978) Statistics for experimenters: an introduction to design, data analysis, and model building. New York, United States, J. Wiley, p. 462-473, ISBN:9780471093152.

See Also

print.r2_kvr2() r2_adjusted()

Examples

Run this code
# Example data set 1. Kvalseth (1985).
df1 <- data.frame(x = c(1:6),
                 y = c(15,37,52,59,83,92))
# Linear regression model with intercept
model_intercept1 <- lm(y ~ x, df1)
# Linear regression model without intercept
model_without1 <- lm(y ~ x - 1, df1)
# Power regression model
model_power1 <- lm(log(y) ~ log(x), df1)
r2(model_intercept1)
r2(model_without1)
r2(model_power1)
# Example data set 2. Kvalseth (1985).
df2 <- data.frame(x = 6:13,
                  y = c(3882, 1266, 733, 450, 410, 305, 185, 112))
power_model2 <- lm(log((y/7343)) ~ log(x), data = df2)
r2(power_model2)
# Example of a Multiple Regression Analysis Model.
# The data for two independent variables given by Box et al. (1978, p. 462)
# as used in Kvalseth (1985).
df3 <- data.frame(x1 = c(0.34, 0.34, 0.58, 1.26, 1.26, 1.82),
                  x2 = c(0.73, 0.73, 0.69, 0.97, 0.97, 0.46),
                  y = c(5.75, 4.79, 5.44, 9.09, 8.59, 5.09))
# Multiple regression analysis model with intercept
model_intercept3 <- lm(y ~ x1 + x2, df3)
# Multiple regression analysis model without intercept
model_without3 <- lm(y ~ x1 + x2 - 1, df3)
# Multiple power regression analysis model
model_power3 <- lm(log(y) ~ log(x1) + log(x2), df3)
r2(model_intercept3)
r2(model_without3)
r2(model_power3)

Run the code above in your browser using DataLab