cov_hall: Covariance estimation after Hall et al. (2008)

Description

Internal function for the estimation of the covariance matrix of the latent process using the approach of Hall et al. (2008). Used in the two-step GFPCA approach implemented in gfpca_twoStep.

This function is an adaptation of the implementation of Jan Gertheiss and Ana-Maria Staicu for Gertheiss et al. (2017), with focus on higher (RAM) efficiency for large data settings.

Usage

cov_hall(
  Y,
  index_evalGrid,
  Kt = 25,
  Kc = 10,
  family = "gaussian",
  diag_epsilon = 0.01,
  make_pd = TRUE
)

Value

Covariance matrix with dimension time_evalGrid x time_evalGrid.

Arguments

Y: Dataframe. Should have values id, value, index.
index_evalGrid: Grid for the evaluation of the covariance structure.
Kt: Number of P-spline basis functions for the estimation of the marginal mean. Defaults to 25.
Kc: Number of marginal P-spline basis functions for smoothing the covariance surface. Defaults to 10.
family: One of c("gaussian","binomial","gamma","poisson"). Poisson data are rounded before performing the GFPCA to ensure integer data, see Details section below. Defaults to "gaussian".
diag_epsilon: Small constant to which diagonal elements of the covariance matrix are set if they are smaller. Defaults to 0.01.
make_pd: Indicator if positive (semi-)definiteness of the returned latent covariance should be ensured via Matrix::near_PD(). Defaults to TRUE.

Author

Alexander Bauer alexander.bauer@stat.uni-muenchen.de and Fabian Scheipl, based on work of Jan Gertheiss and Ana-Maria Staicu

Details

The implementation deviates from the algorithm described in Hall (2008) in one crucial step -- we compute the crossproducts of centered observations and smooth the surface of these crossproducts directly instead of computing and smoothing the surface of crossproducts of uncentered observations and subsequently subtracting the (crossproducts of the) mean function. The former seems to yield smoother eigenfunctions and fewer non-positive-definite covariance estimates.

If the data Y or the crossproduct matrix contain more than 100,000 rows or elements, the estimation of the marginal mean or the smoothing step of the covariance matrix are performed by using the discretization-based estimation algorithm in bam rather than the gam estimation algorithm.

References

Hall, P., Müller, H. G., & Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 703--723.

Gertheiss, J., Goldsmith, J., & Staicu, A. M. (2017). A note on modeling sparse exponential-family functional response curves. Computational statistics & data analysis, 105, 46--52.

Examples

Run this code

data(growth_incomplete)

index_grid = c(1.25, seq(from = 2, to = 18, by = 1))
cov_matrix = registr:::cov_hall(growth_incomplete, index_evalGrid = index_grid)

Run the code above in your browser using DataLab