Learn R Programming

gustave (version 0.3.0)

rescal: Linear Regression Residuals Calculation

Description

rescal calculates linear regression residuals in an efficient way : handling several dependent variables at a time, using Matrix::TsparseMatrix capabilities and allowing for pre-calculation of the matrix inverse.

Usage

rescal(y = NULL, x, w = NULL, by = NULL, collinearity.check = NULL,
  precalc = NULL)

Arguments

y

A numerical matrix of dependent variable(s). May be a Matrix::TsparseMatrix.

x

A numerical matrix of independent variable(s). May be a Matrix::TsparseMatrix.

w

An optional numerical vector of row weights.

by

An optional categorical vector (factor or character) when residuals calculation is to be conducted within by-groups (see Details).

collinearity.check

A boolean (TRUE or FALSE) or NULL indicating whether to perform a check for collinearity or not (see Details).

precalc

A list of pre-calculated results (see Details).

Value

  • if y is not NULL (calculation step) : a numerical matrix with same structure (regular base::matrix or Matrix::TsparseMatrix) and dimensions as y.

  • if y is NULL (pre-calculation step) : a list containing pre-calculated data:

    • x: the numerical matrix of independent variables.

    • w: the numerical vector of row weights (vector of 1 by default).

    • inv: the inverse of t(x) %*% Matrix::Diagonal(x = w) %*% x

Details

In the context of the gustave package, linear regression residual calculation is solely used to take into account the effect of calibration on variance estimation. Independent variables are therefore most likely to be the same from one variance estimation to another, hence the inversion of the matrix t(x) %*% Diagonal(x = w) %*% x can be done once and for all at a pre-calculation step.

The parameters y and precalc determine whether a list of pre-calculated data should be used in order to speed up the regression residuals computation at execution time:

  • if y not NULL and precalc NULL : on-the-fly calculation of the matrix inverse and the regression residuals (no pre-calculation).

  • if y NULL and precalc NULL : pre-calculation of the matrix inverse which is stored in a list of pre-calculated data.

  • if y not NULL and precalc not NULL : calculation of the regression residuals using the list of pre-calculated data.

The by parameter allows for calculation within by-groups : all calculation are made separately for each by-group (when calibration was conducted separately on several subsamples), but in an efficient way using Matrix::TsparseMatrix capabilities (especially when the matrix inverse is pre-calculated).

If collinearity.check is NULL, a test for collinearity in the independent variables (x) is conducted if and only if det(t(x) %*% x) == 0.

Examples

Run this code
# NOT RUN {
# Generating random data
set.seed(1)
n <- 100
H <- 5
y <- matrix(rnorm(2*n), nrow = n)
x <- matrix(rnorm(10*n), nrow = n)
by <- letters[sample(1:H, n, replace = TRUE)]

# Direct calculation
rescal(y, x)

# Calculation with pre-calculated data
precalc <- rescal(y = NULL, x)
rescal(y, precalc = precalc)
identical(rescal(y, x), rescal(y, precalc = precalc))

# Collinearity check
rescal(y, cbind(x, x[, 1]), collinearity.check = TRUE)

# Matrix::TsparseMatrix capability
require(Matrix)
X <- as(x, "TsparseMatrix")
Y <- as(y, "TsparseMatrix")
rescal(Y, X)

# by parameter for within by-groups calculation
rescal(Y, X, by = by)
identical(
 rescal(Y, X, by = by)[by == "a", ]
 , rescal(Y[by == "a", ], X[by == "a", ])
)

# }

Run the code above in your browser using DataLab