Regression with compositional data using the \(\alpha\)-transformation.
alfa.reg(y, x, a, covb = FALSE, xnew = NULL, yb = NULL)
alfa.reg2(y, x, a, xnew = NULL, ncores = 1)
alfa.reg3(y, x, a = c(-1, 1), xnew = NULL)For the alfa.reg() function a list including:
The time required by the regression.
The beta coefficients.
The covariance matrix if covb was set to TRUE, otherwise NULL.
The sum of the squared residuals, as produced by the function minpack.lm::nls.lm().
The fitted values for xnew if xnew is not NULL.
For the alfa.reg2() function a list with the time required by all regressions and the regression coefficients and the fitted values for each value of \(\alpha\).
For the alfa.reg3() function a list with the previous elements plus an output "alfa", the optimal value of \(\alpha\).
A matrix with the compositional data.
A matrix with the continuous predictor variables or a data frame including categorical predictor variables.
The value of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0.
If \(\alpha=0\) the isometric log-ratio transformation is applied and the solution exists in a closed form, since it the
classical mutivariate regression. For the alfa.reg2() this should be a vector of \(\alpha\) values and the function call
repeatedly the alfa.reg() function. For the alfa.reg3() function it should be a vector with two values, the endpoints of the
interval of \(\alpha\). This function searches for the optimal vaue of \(\alpha\) that minimizes the Kullback-Leibler
between the observed and fitted compositions. Using the optimize function it searches for the optimal value of
\(\alpha\). Instead of choosing the value of \(\alpha\) using cv.alfareg (that uses cross-validation)
one can select it this way.
Do you want the covariance matrix of the regression coefficients to be returned? If TRUE, this will slow down the process, as it is computed numerically.
If you have new data use it, otherwise leave it NULL.
The number of cores to use for parallel computations.
If you have already transformed the data using the \(\alpha\)-transformation with the same \(\alpha\) as given in the argument "a", put it here. Othewrise leave it NULL.
This is intended to be used in the function cv.alfareg in order to speed up the process. The time difference in that function is small for small samples.
But, if you have a few thousands and or a few more components, there will be bigger differences.
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
The \(\alpha\)-transformation is applied to the compositional data first and then multivariate regression is applied. This involves numerical optimisation. The alfa.reg2() function accepts a vector with many values of \(\alpha\), while the the alfa.reg3() function searches for the value of \(\alpha\) that minimizes the Kulback-Leibler divergence between the observed and the fitted compositional values. The functions are highly optimized.
Tsagris M. (2025). The \(\alpha\)--regression for compositional data: a unified framework for standard, spatially-lagged, spatial autoregressive and geographically-weighted regression models. https://arxiv.org/pdf/2510.12663
Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2): 47-57. https://arxiv.org/pdf/1508.01913v1.pdf
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
Mardia K.V., Kent J.T., and Bibby J.M. (1979). Multivariate analysis. Academic press.
Aitchison J. (1986). The statistical analysis of compositional data. Chapman & Hall.
cv.alfareg, alfareg.nr, alfa.slx
data(fadn)
y <- fadn[, 3:7]
x <- fadn[, 8]
mod <- alfa.reg(y, x, 0.2)
Run the code above in your browser using DataLab