sk_LL: Likelihood of covariance model `pars` given the data in sk grid `g`

Description

This computes the log-likelihood for the Gaussian process covariance model pars, given 2-dimensional grid data g, and, optionally, linear trend data in X.

Usage

sk_LL(pars, g, X = 0, fac_method = "chol", fac = NULL, quiet = TRUE, out = "l")

Value

numeric, the likelihood of pars given g_obs and X, or list (if more=TRUE)

Arguments

pars: list of form returned by sk_pars (with entries 'y', 'x', 'eps', 'psill')
g: an sk grid (or list with entries 'gdim', 'gres', 'gval' and/or 'idx_grid')
X: numeric, vector, matrix, or NA, a fixed mean value, or matrix of linear predictors
fac_method: character, the factorization to use: 'chol' (default) or 'eigen'
fac: matrix or list, (optional) pre-computed covariance factorization
quiet: logical indicating to suppress console output
out: character, either 'l' (likelihood), 'a' (AIC), 'b' (BIC), or 'more' (see details)

Details

The function evaluates:

-log( 2 * pi ) - ( 1/2 ) * ( log_det + quad_form ),

where log_det is the logarithm of the determinant of covariance matrix V, and quad_form is z^T V^-1 z, for the observed response vector z, which is constructed by subtracting the trend specified in X (if any) from the non-NA values in g.

If the trend spatially uniform and known, it can be supplied in argument X as a numeric scalar. The default is zero-mean model X=0, which assumes users has subtracted the trend from g beforehand.

If the trend is unknown, the function will automatically use GLS to estimate it. This is profile likelihood on the covariance function parameters (not REML). To estimate a spatially constant mean, set X=NA. To estimate a spatially variable mean, supply linear predictors as columns of a matrix argument to X (see sk_GLS). Users can also pass a multi-layer bk grid X with covariates in layers.

fac_method specifies how to factorize V; either by using the Cholesky factor ('chol') or eigen-decomposition ('eigen'). A pre-computed factorization fac can be supplied by first calling sk_var(..., scaled=TRUE) (in which case fac_method is ignored).

When out='a', the function instead returns the AIC value and when out='b' it returns the BIC (see stats::AIC). This adjusts the likelihood for the number of covariance and trend parameters (and in the case of BIC, the sample size), producing an index that can be used for model comparison (lower is better).

When out='more', the function returns a list containing the log-likelihood and both information criteria, along with several diagnostics: the number of observations, the number of parameters, the log-determinant log_det, and the quadratic form quad_form.

Examples

Run this code

# set up example grid, covariance parameters
gdim = c(25, 12)
n = prod(gdim)
g_empty = g_lm = sk(gdim)
pars = utils::modifyList(sk_pars(g_empty, 'gau'), list(psill=0.7, eps=5e-2))

# generate some coefficients
n_betas = 3
betas = stats::rnorm(n_betas)

# generate covariates and complete data in grid and vector form
g_X = sk_sim(g_empty, pars, n_layer=n_betas-1L)
X = g_X[]
X_all = cbind(1, X)
g_lm = g_empty
g_lm[] = c(X_all %*% betas)

# add some noise
g_all = sk_sim(g_empty, pars) + g_lm
z = g_all[]

# two methods for likelihood
LL_chol = sk_LL(pars, g_all, fac_method='chol')
LL_eigen = sk_LL(pars, g_all, fac_method='eigen')

# compare to working directly with matrix inverse
V = sk_var(g_all, pars, fac_method='none', sep=FALSE)
V_inv = chol2inv(chol(V))
quad_form = as.numeric( t(z) %*% crossprod(V_inv, z) )
log_det = as.numeric( determinant(V, logarithm=TRUE) )[1]
LL_direct = (-1/2) * ( n * log( 2 * pi ) + log_det + quad_form )

# relative errors
abs( LL_direct - LL_chol ) / max(LL_direct, LL_chol)
abs( LL_direct - LL_eigen ) / max(LL_direct, LL_eigen)

# get AIC or BIC directly
sk_LL(pars, g_all, out='a')
sk_LL(pars, g_all, out='b')

# repeat with pre-computed variance factorization
fac_eigen = sk_var(g_all, pars, fac_method='eigen', sep=TRUE)
sk_LL(pars, g_all, fac=fac_eigen) - LL_eigen

# repeat with multi-layer example
n_layer = 10
g_noise_multi = sk_sim(g_empty, pars, n_layer)
g_multi = g_lm + g_noise_multi
LL_chol = sk_LL(pars, g_multi, fac_method='chol')
LL_eigen = sk_LL(pars, g_multi, fac_method='eigen')
LL_direct = sum(sapply(seq(n_layer), function(j) {
 quad_form = as.numeric( t(g_multi[,j]) %*% crossprod(V_inv, g_multi[,j]) )
 (-1/2) * ( n * log( 2 * pi ) + log_det + quad_form )
}))

# relative errors
abs( LL_direct - LL_chol ) / max(LL_direct, LL_chol)
abs( LL_direct - LL_eigen ) / max(LL_direct, LL_eigen)

# repeat with most data missing
is_obs = seq(n) %in% sort(sample.int(n, 50))
n_obs = sum(is_obs)
g_obs = g_empty
z_obs = g_all[is_obs]
g_obs[is_obs] = z_obs

# take subsets of covariates
g_X_obs = g_X
g_X_obs[!is_obs,] = NA
X_obs = X[is_obs,]

LL_chol_obs = sk_LL(pars, g_obs, fac_method='chol')
LL_eigen_obs = sk_LL(pars, g_obs, fac_method='eigen')

# working directly with matrix inverse
V_obs = sk_var(g_obs, pars, fac_method='none')
V_obs_inv = chol2inv(chol(V_obs))
quad_form_obs = as.numeric( t(z_obs) %*% crossprod(V_obs_inv, z_obs) )
log_det_obs = as.numeric( determinant(V_obs, logarithm=TRUE) )[1]
LL_direct_obs = (-1/2) * ( n_obs * log( 2 * pi ) + log_det_obs + quad_form_obs )
abs( LL_direct_obs - LL_chol_obs ) / max(LL_direct_obs, LL_chol_obs)
abs( LL_direct_obs - LL_eigen_obs ) / max(LL_direct_obs, LL_eigen_obs)

# again using a pre-computed variance factorization
fac_chol_obs = sk_var(g_obs, pars, fac_method='chol', scaled=TRUE)
fac_eigen_obs = sk_var(g_obs, pars, fac_method='eigen', scaled=TRUE)
sk_LL(pars, g_obs, fac=fac_chol_obs) - LL_chol_obs
sk_LL(pars, g_obs, fac=fac_eigen_obs) - LL_eigen_obs

# detrend the data by hand, with and without covariates then compute likelihood
g_obs_dtr = g_obs - sk_GLS(g_obs, pars)
g_obs_X_dtr = g_obs - sk_GLS(g_obs, pars, g_X)
LL_dtr = sk_LL(pars, g_obs_dtr, X=0)
LL_X_dtr = sk_LL(pars, g_obs_X_dtr, X=0)

# or pass a covariates grid (or matrix) to de-trend automatically
LL_dtr - sk_LL(pars, g_obs, X=NA)
LL_X_dtr - sk_LL(pars, g_obs, X=g_X)

# note that this introduce new unknown parameter(s), so AIC and BIC increase (worsen)
sk_LL(pars, g_obs, X=NA, out='a') > sk_LL(pars, g_obs_dtr, X=0, out='a')
sk_LL(pars, g_obs, X=g_X, out='a') > sk_LL(pars, g_obs_X_dtr, X=0, out='a')

# X can be the observed subset, or the full grid (as sk grid or as matrix)
sk_LL(pars, g_obs, X=X)
sk_LL(pars, g_obs, X=X_obs)
sk_LL(pars, g_obs, X=g_X)
sk_LL(pars, g_obs, X=g_X_obs)

# equivalent sparse input specification
g_sparse = g_all
g_sparse[] = matrix(g_obs[], ncol=1)
g_sparse = sk(gval=matrix(g_obs[], ncol=1), gdim=gdim)
LL_chol_obs - sk_LL(pars, g_sparse)
LL_eigen_obs - sk_LL(pars, g_sparse)
LL_dtr - sk_LL(pars, g_sparse, X=NA)
LL_X_dtr - sk_LL(pars, g_sparse, X=g_X)

## repeat with complete data

# the easy way to get likelihood
LL_X_chol = sk_LL(pars, g_all, g_X)
LL_X_eigen = sk_LL(pars, g_all, g_X, fac_method='eigen')

# the hard way
V = sk_var(g_all, pars, sep=FALSE)
V_inv = chol2inv(chol(V))
X_tilde_inv = chol2inv(chol( crossprod(crossprod(V_inv, X_all), X_all) ))
betas_gls = X_tilde_inv %*% crossprod(X_all, (V_inv %*% z))
z_gls = z - (X_all %*% betas_gls)
z_gls_trans = crossprod(V_inv, z_gls)
quad_form = as.numeric( t(z_gls) %*% z_gls_trans )
log_det = as.numeric( determinant(V, logarithm=TRUE) )[1]
LL_direct = (-1/2) * ( n * log( 2 * pi ) + log_det + quad_form )
abs( LL_direct - LL_X_chol ) / max(LL_direct, LL_X_chol)
abs( LL_direct - LL_X_eigen ) / max(LL_direct, LL_X_eigen)

# return detailed list of components with out='more'
LL_result = sk_LL(pars, g_all, X=X, out='more')
LL_result[['LL']] - LL_X_chol
LL_result[['quad_form']] - quad_form
LL_result[['log_det']] - log_det
LL_result[['n_obs']] - n