linear_algebra_stats: Simple Linear Algebra Functions for Statistics

Description

'broadcast' provides some simple Linear Algebra Functions for Statistics:
cinv()
sd_lc()
ecumprob()

Usage

cinv(x)
sd_lc(X, vc, bad_rp = NaN)
ecumprob(y, sim, eps = 0)

Value

For cinv():

A matrix.

For sd_lc():

A vector of standard deviations.

For ecumprob():

A vector of cumulative probabilities.

If for any observation i (after broadcasting,) y[i] is NA/NaN or any of sim[i,] is NA/NaN, the result for i will be NA.

If zero-length y or sim is given, a zero-length numeric vector is returned.

Arguments

x: a real symmetric positive-definite square matrix.
X: a numeric (or logical) matrix of multipliers/constants
vc: the variance-covariance matrix for the (correlated) random variables.
bad_rp: if vc is not a Positive (semi-) Definite matrix, give here the value to replace bad standard deviations with.
y: values to estimate the cumulative probability for.
sim: a matrix (or data.frame) with at least 500 columns of simulated values.
If sim is given as a dimensionless vector, it will be treated as a matrix with 1 row and length(sim) columns, and this will be noted with a message.
eps: a non-negative numeric scaler smaller than 0.1, giving the cut-off value for probabilities.
Probabilities smaller than eps will be replaced with eps, and probabilities larger than 1 - eps will be replaced with 1 - eps.
Set eps = 0 to disable probability trimming.

Details

cinv()
cinv() computes the Choleski inverse of a real symmetric positive-definite square matrix.

sd_lc()
Given the linear combination X %*% b, where:

X is a matrix of multipliers/constants;
b is a vector of (correlated) random variables;
vc is the symmetric variance-covariance matrix for b;

sd_lc(X, vc) computes the standard deviations for the linear combination X %*% b, without making needless copies.
sd_lc(X, vc) will use much less memory than a base 'R' approach.
sd_lc(X, vc) will usually be faster than a base 'R' approach (depending on the Linear Algebra Library used for base 'R').

ecumprob()
The ecumprod(y, sim) function takes a matrix (or data.frame) of simulated values sim, and for each row i (after broadcasting), estimates the cumulative distribution function of sim[i, ], and returns the cumulative probability for y[i].

In terms of statistics, it is equivalent to the following operation for each index i:
ecdf(sim[i,])(y[i])
However, ecumprob() is much faster, and supports NAs/NaNs.

In terms of linear algebra, it is equivalent to the following broadcasted operation:
rowMeans(sim <= y)
where y and sim are broadcaster arrays.
However, ecumprob() is much more memory-efficient, supports a data.frame for sim, and has statistical safety checks.

References

John A. Rice (2007), Mathematical Statistics and Data Analysis (6th Edition)

Examples

Run this code


# variances ====
vc <- datasets::ability.cov$cov
X <- matrix(rnorm(100), 100, ncol(vc))

solve(vc)
cinv(vc) # faster than `solve()`, but only works on positive definite matrices
all(round(solve(vc), 6) == round(cinv(vc), 6)) # they're the same

sd_lc(X, vc)



# ecumprob() ====

sim <- rnbinom(10 * 5000L, mu = 3, size = 2) |> matrix(10, 5000)
y <- sample(0:9)

# vector:
pnbinom(y[1], mu = 3, size = 2) # real probability
ecumprob(y[1], sim[1, , drop = TRUE]) # approximation

# matrix:
pnbinom(y, mu = 3, size = 2) # real probability
ecumprob(y, sim) # approximation

# data.frame:
pnbinom(y, mu = 3, size = 2) # real probability
ecumprob(y, as.data.frame(sim)) # approximation

Run the code above in your browser using DataLab