Learn R Programming

goffda (version 0.1.2)

flm_est: Estimation of functional linear models

Description

Estimation of the linear operator relating a functional predictor \(X\) with a functional response \(Y\) in the linear model $$Y(t) = \int_a^b \beta(s, t) X(s)\,\mathrm{d}s + \varepsilon(t),$$ where \(X\) is a random variable in the Hilbert space of square-integrable functions in \([a, b]\), \(L^2([a, b])\), \(Y\) and \(\varepsilon\) are random variables in \(L^2([c, d])\), and \(s \in [a, b]\) and \(t \in [c, d]\).

The linear, Hilbert--Schmidt, integral operator is parametrized by the bivariate kernel \(\beta \in L^2([a, b]) \otimes L^2([c, d])\). Its estimation is done through the truncated expansion of \(\beta\) in the tensor product of the data-driven bases of the Functional Principal Components (FPC) of \(X\) and \(Y\), and through the fitting of the resulting multivariate linear model. The FPC basis for \(X\) is truncated in \(p\) components, while the FPC basis for \(Y\) is truncated in \(q\) components. Automatic selection of \(p\) and \(q\) is detailed below.

The particular cases in which either \(X\) or \(Y\) are constant functions give either a scalar predictor or response. The simple linear model arises if both \(X\) and \(Y\) are scalar, for which \(\beta\) is a constant.

Usage

flm_est(X, Y, est_method = "fpcr_l1s", p = NULL, q = NULL,
  thre_p = 0.99, thre_q = 0.99, lambda = NULL, X_fpc = NULL,
  Y_fpc = NULL, compute_residuals = TRUE, centered = FALSE,
  int_rule = "trapezoid", cv_verbose = FALSE, ...)

Value

A list with the following entries:

Beta_hat

estimated \(\beta\), a matrix with values \(\hat\beta(s, t)\) evaluated at the grid points for X and Y. Its size is c(length(X$argvals), length(Y$argvals)).

Beta_hat_scores

the matrix of coefficients of Beta_hat (resulting from projecting it into the tensor basis of X_fpc and Y_fpc), with dimension c(p_hat, q_thre).

H_hat

hat matrix of the associated fitted multivariate linear model, a matrix of size c(n, n). NULL if est_method = "fpcr_l1", since lasso estimation does not provide it explicitly.

p_thre

index vector indicating the FPC of X considered for estimating the model. Chosen by thre_p or equal to p if given.

p_hat

index vector of the FPC considered by the methods "fpcr_l1" and "fpcr_l1s" methods after further selection of the FPC considered in p_thre. For methods "fpcr" and "fpcr_l2", p_hat equals p_thre.

q_thre

index vector indicating the FPC of Y considered for estimating the model. Chosen by thre_q or equal to q if given. Note that zeroing by lasso procedure only affects in the rows.

est_method

the estimation method employed.

Y_hat

fitted values, either an fdata object or a vector, depending on Y.

Y_hat_scores

the matrix of coefficients of Y_hat, with dimension c(n, q_thre).

residuals

residuals of the fitted model, either an fdata object or a vector, depending on Y.

residuals_scores

the matrix of coefficients of residuals, with dimension c(n, q_thre).

X_fpc, Y_fpc

FPC of X and Y, as returned by fpc with n_fpc = n.

lambda

regularization parameter \(\lambda\) used for the estimation methods "fpcr_l2", "fpcr_l1", and "fpcr_l1s".

cv

cross-validation object returned by cv_glmnet.

Arguments

X, Y

samples of functional/scalar predictors and functional/scalar response. Either fdata objects (for functional variables) or vectors of length n (for scalar variables).

est_method

either "fpcr" (Functional Principal Components Regression; FPCR), "fpcr_l2" (FPCR with ridge penalty), "fpcr_l1" (FPCR with lasso penalty) or "fpcr_l1s" (FPCR with lasso-selected FPC). If X is scalar, flm_est only considers "fpcr" as estimation method. See details below. Defaults to "fpcr_l1s".

p, q

index vectors indicating the specific FPC to be considered for the truncated bases expansions of X and Y, respectively. If a single number for p is provided, then p <- 1:max(p) internally (analogously for q) and the first max(p) FPC are considered. If NULL (default), then a data-driven selection of p and q is done. See details below.

thre_p, thre_q

thresholds for the proportion of variance that is explained, at least, by the first \(p\) and \(q\) FPC of X and Y, respectively. These thresholds are employed for an (initial) automatic selection of \(p\) and \(q\). Default to 0.99. thre_p (thre_q) is ignored if p (q) is provided.

lambda

regularization parameter \(\lambda\) for the estimation methods "fpcr_l2", "fpcr_l1", and "fpcr_l1s". If NULL (default), it is chosen with cv_glmnet.

X_fpc, Y_fpc

FPC decompositions of X and Y, as returned by fpc. Computed if not provided.

compute_residuals

whether to compute the fitted values Y_hat and its Y_hat_scores, and the residuals of the fit and its residuals_scores. Defaults to TRUE.

centered

flag to indicate if X and Y have been centered or not, in order to avoid their recentering. Defaults to FALSE.

int_rule

quadrature rule for approximating the definite unidimensional integral: trapezoidal rule (int_rule = "trapezoid") and extended Simpson rule (int_rule = "Simpson") are available. Defaults to "trapezoid".

cv_verbose

flag to display information about the estimation procedure (passed to cv_glmnet). Defaults to FALSE.

...

further parameters to be passed to cv_glmnet (and then to cv.glmnet) such as cv_1se, cv_nlambda or cv_parallel, among others.

Author

Eduardo García-Portugués and Javier Álvarez-Liébana.

Details

flm_est deals seamlessly with either functional or scalar inputs for the predictor and response. In the case of scalar inputs, the corresponding dimension-related arguments (p, q, thre_p or thre_q) will be ignored as in these cases either \(p = 1\) or \(q = 1\).

The function translates the functional linear model into a multivariate model with multivariate response and then estimates the \(p \times q\) matrix of coefficients of \(\beta\) in the tensor basis of the FPC of X and Y. The following estimation methods are implemented:

  • "fpcr": Functional Principal Components Regression (FPCR); see details in Ramsay and Silverman (2005).

  • "fpcr_l2": FPCR, with ridge penalty on the associated multivariate linear model.

  • "fpcr_l1": FPCR, with lasso penalty on the associated multivariate linear model.

  • "fpcr_l1s": FPCR, with FPC selected by lasso regression on the associated multivariate linear model.

The last three methods are explained in García-Portugués et al. (2021).

The \(p\) FPC of X and \(q\) FPC of Y are determined as follows:

  • If p = NULL, then p is set as p_thre <- 1:j_thre, where j_thre is the \(j\)-th FPC of X for which the cumulated proportion of explained variance is greater than thre_p. If p != NULL, then p_thre <- p.

  • If q = NULL, then the same procedure is followed with thre_q, resulting q_thre.

Once p_thre and q_thre have been obtained, the methods "fpcr_l1" and "fpcr_l1s" perform a second selection of the FPC that are effectively considered in the estimation of \(\beta\). This subset of FPC (of p_thre) is encoded in p_hat. No further selection of FPC is done for the methods "fpcr" and "fpcr_l2".

The flag compute_residuals controls if Y_hat, Y_hat_scores, residuals, and residuals_scores are computed. If FALSE, they are set to NULL. Y_hat equals \(\hat Y_i(t) = \int_a^b \hat \beta(s, t) X_i(s) \,\mathrm{d}s\) and residuals stands for \(\hat \varepsilon_i(t) = Y_i(t) - \hat Y_i(t)\), both for \(i = 1, \ldots, n\). Y_hat_scores and
residuals_scores are the \(n\times q\) matrices of coefficients (or scores) of these functions in the FPC of Y.

Missing values on X and Y are automatically removed.

References

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502--528. tools:::Rd_expr_doi("10.1111/sjos.12486")

Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer-Verlag, New York.

Examples

Run this code
## Quick example of functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# Lasso-selection FPCR (p and q are estimated)
flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")
# \donttest{
## Functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# FPCR (p and q are estimated)
fpcr_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr")
fpcr_1$Beta_hat_scores
fpcr_1$p_thre
fpcr_1$q_thre

# FPCR (p and q are provided)
fpcr_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
                  p = c(1, 5, 2, 7), q = 2:1)
fpcr_2$Beta_hat_scores
fpcr_2$p_thre
fpcr_2$q_thre

# Ridge FPCR (p and q are estimated)
l2_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2")
l2_1$Beta_hat_scores
l2_1$p_hat

# Ridge FPCR (p and q are provided)
l2_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
                p = c(1, 5, 2, 7), q = 2:1)
l2_2$Beta_hat_scores
l2_2$p_hat

# Lasso FPCR (p and q are estimated)
l1_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1")
l1_1$Beta_hat_scores
l1_1$p_thre
l1_1$p_hat

# Lasso estimator (p and q are provided)
l1_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
                p = c(1, 5, 2, 7), q = 2:1)
l1_2$Beta_hat_scores
l1_2$p_thre
l1_2$p_hat

# Lasso-selection FPCR (p and q are estimated)
l1s_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")
l1s_1$Beta_hat_scores
l1s_1$p_thre
l1s_1$p_hat

# Lasso-selection FPCR (p and q are provided)
l1s_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
                 p = c(1, 5, 2, 7), q = 1:4)
l1s_2$Beta_hat_scores
l1s_2$p_thre
l1s_2$p_hat

## Scalar response

# Generate data
set.seed(12345)
n <- 50
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
X_fdata <- fdata_cen(r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2))
epsilon <- rnorm(n, sd = 0.25)
Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta)) + epsilon

# FPCR
fpcr_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr")
fpcr_4$p_hat

# Ridge FPCR
l2_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l2")
l2_4$p_hat

# Lasso FPCR
l1_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1")
l1_4$p_hat

# Lasso-selection FPCR
l1s_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1s")
l1s_4$p_hat

## Scalar predictor

# Generate data
set.seed(12345)
n <- 50
X <- rnorm(n)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
Y_fdata <- beta * X + epsilon

# FPCR
fpcr_4 <- flm_est(X = X, Y = Y_fdata, est_method = "fpcr")
plot(beta, col = 2)
lines(beta$argvals, drop(fpcr_4$Beta_hat))
# }

Run the code above in your browser using DataLab