Learn R Programming

OPCreg (version 2.0.0)

IPCR: Incremental Principal Component Regression for Online Datasets

Description

The IPCR function implements an incremental Principal Component Regression (PCR) method designed to handle online datasets. It updates the principal components recursively as new data arrives, making it suitable for real-time data processing.

Usage

IPCR(data, eta, m, alpha)

Value

A list containing the following elements:

Bhat

The estimated regression coefficients, including the intercept.

RMSE

The Root Mean Square Error of the regression model.

summary

The summary of the linear regression model.

yhat

The predicted values of the response variable.

Arguments

data

A data frame where the first column is the response variable and the remaining columns are predictor variables.

eta

The proportion of the initial sample size used to initialize the principal components (0 < eta < 1). Default is 0.0035.

m

The number of principal components to retain. Default is 3.

alpha

The significance level used for calculating critical values. Default is 0.05.

Details

The IPCR function performs the following steps: 1. Standardizes the predictor variables. 2. Initializes the principal components using the first n0 = round(eta * n) samples. 3. Recursively updates the principal components as each new sample arrives. 4. Fits a linear regression model using the principal component scores. 5. Back-transforms the regression coefficients to the original scale.

This method is particularly useful for datasets where new observations are continuously added, and the model needs to be updated incrementally.

See Also

lm: For fitting linear models.

eigen: For computing eigenvalues and eigenvectors.

Examples

Run this code
if (FALSE) {
set.seed(1234)
library(MASS)
n <- 2000
p <- 10
mu0 <- as.matrix(runif(p, 0))
sigma0 <- as.matrix(runif(p, 0, 10))
ro <- as.matrix(c(runif(round(p / 2), -1, -0.8), runif(p - round(p / 2), 0.8, 1)))
R0 <- ro %*% t(ro)
diag(R0) <- 1
Sigma0 <- sigma0 %*% t(sigma0) * R0
x <- mvrnorm(n, mu0, Sigma0)
colnames(x) <- paste("x", 1:p, sep = "")
e <- rnorm(n, 0, 1)
B <- sample(1:3, (p + 1), replace = TRUE)
en <- matrix(rep(1, n * 1), ncol = 1)
y <- cbind(en, x) %*% B + e
colnames(y) <- paste("y")
data <- data.frame(cbind(y, x))

result <- IPCR(data = data, m = 3, eta = 0.0035, alpha = 0.05)
print(result$Bhat)
print(result$yhat)
print(result$RMSE)
print(result$summary)
}

Run the code above in your browser using DataLab