Regression with compositional data using the alpha-transformation: Regression with compositional data using the \(\alpha\)-transformation

Description

Regression with compositional data using the \(\alpha\)-transformation.

Usage

alfa.reg(y, x, a, covb = FALSE, xnew = NULL, yb = NULL)
alfa.reg2(y, x, a, xnew = NULL, ncores = 1)
alfa.reg3(y, x, a = c(-1, 1), xnew = NULL)

Value

For the alfa.reg() function a list including:

runtime: The time required by the regression.
be: The beta coefficients.
covbe: The covariance matrix if covb was set to TRUE, otherwise NULL.
dev: The sum of the squared residuals, as produced by the function minpack.lm::nls.lm().
est: The fitted values for xnew if xnew is not NULL.

For the alfa.reg2() function a list with the time required by all regressions and the regression coefficients and the fitted values for each value of \(\alpha\).

For the alfa.reg3() function a list with the previous elements plus an output "alfa", the optimal value of \(\alpha\).

Arguments

y

A matrix with the compositional data.

x

A matrix with the continuous predictor variables or a data frame including categorical predictor variables.

a

The value of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0. If \(\alpha=0\) the isometric log-ratio transformation is applied and the solution exists in a closed form, since it the classical mutivariate regression. For the alfa.reg2() this should be a vector of \(\alpha\) values and the function call repeatedly the alfa.reg() function. For the alfa.reg3() function it should be a vector with two values, the endpoints of the interval of \(\alpha\). This function searches for the optimal vaue of \(\alpha\) that minimizes the Kullback-Leibler between the observed and fitted compositions. Using the optimize function it searches for the optimal value of \(\alpha\). Instead of choosing the value of \(\alpha\) using cv.alfareg (that uses cross-validation) one can select it this way.

covb

Do you want the covariance matrix of the regression coefficients to be returned? If TRUE, this will slow down the process, as it is computed numerically.

xnew

If you have new data use it, otherwise leave it NULL.

ncores

The number of cores to use for parallel computations.

yb

If you have already transformed the data using the \(\alpha\)-transformation with the same \(\alpha\) as given in the argument "a", put it here. Othewrise leave it NULL.

This is intended to be used in the function cv.alfareg in order to speed up the process. The time difference in that function is small for small samples. But, if you have a few thousands and or a few more components, there will be bigger differences.

Author

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

Details

The \(\alpha\)-transformation is applied to the compositional data first and then multivariate regression is applied. This involves numerical optimisation. The alfa.reg2() function accepts a vector with many values of \(\alpha\), while the the alfa.reg3() function searches for the value of \(\alpha\) that minimizes the Kulback-Leibler divergence between the observed and the fitted compositional values. The functions are highly optimized.

References

Tsagris M. (2025). The \(\alpha\)--regression for compositional data: a unified framework for standard, spatially-lagged, spatial autoregressive and geographically-weighted regression models. https://arxiv.org/pdf/2510.12663

Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2): 47-57. https://arxiv.org/pdf/1508.01913v1.pdf

Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf

Mardia K.V., Kent J.T., and Bibby J.M. (1979). Multivariate analysis. Academic press.

Aitchison J. (1986). The statistical analysis of compositional data. Chapman & Hall.

Examples

Run this code

data(fadn)
y <- fadn[, 3:7]
x <- fadn[, 8]
mod <- alfa.reg(y, x, 0.2)

Run the code above in your browser using DataLab