Internal function for the estimation of the covariance matrix of the latent
process using the approach of Hall et al. (2008). Used in the
two-step GFPCA approach implemented in gfpca_twoStep
.
This function is an adaptation of the implementation of Jan
Gertheiss and Ana-Maria Staicu for Gertheiss et al. (2017), with focus on
higher (RAM) efficiency for large data settings.
cov_hall(
Y,
index_evalGrid,
Kt = 25,
Kc = 10,
family = "gaussian",
diag_epsilon = 0.01,
make_pd = TRUE
)
Covariance matrix with dimension time_evalGrid x time_evalGrid
.
Dataframe. Should have values id, value, index.
Grid for the evaluation of the covariance structure.
Number of P-spline basis functions for the estimation of the marginal mean. Defaults to 25.
Number of marginal P-spline basis functions for smoothing the covariance surface. Defaults to 10.
One of c("gaussian","binomial","gamma","poisson")
.
Poisson data are rounded before performing
the GFPCA to ensure integer data, see Details section below.
Defaults to "gaussian"
.
Small constant to which diagonal elements of the covariance matrix are set if they are smaller. Defaults to 0.01.
Indicator if positive (semi-)definiteness of the returned
latent covariance should be ensured via Matrix::near_PD()
. Defaults to
TRUE.
Alexander Bauer alexander.bauer@stat.uni-muenchen.de and Fabian Scheipl, based on work of Jan Gertheiss and Ana-Maria Staicu
The implementation deviates from the algorithm described in Hall (2008) in one crucial step -- we compute the crossproducts of centered observations and smooth the surface of these crossproducts directly instead of computing and smoothing the surface of crossproducts of uncentered observations and subsequently subtracting the (crossproducts of the) mean function. The former seems to yield smoother eigenfunctions and fewer non-positive-definite covariance estimates.
If the data Y
or the crossproduct matrix contain more than
100,000
rows or elements, the estimation of the marginal mean or
the smoothing step of the covariance matrix are performed by
using the discretization-based estimation algorithm in bam
rather than the gam
estimation algorithm.
Hall, P., Müller, H. G., & Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 703--723.
Gertheiss, J., Goldsmith, J., & Staicu, A. M. (2017). A note on modeling sparse exponential-family functional response curves. Computational statistics & data analysis, 105, 46--52.
data(growth_incomplete)
index_grid = c(1.25, seq(from = 2, to = 18, by = 1))
cov_matrix = registr:::cov_hall(growth_incomplete, index_evalGrid = index_grid)
Run the code above in your browser using DataLab