predict.spmeshed: Posterior predictive sampling for models based on MGPs

Description

Sample from the posterior predictive distribution of the outcomes at new spatial or spatiotemporal locations after MCMC.

Usage

# S3 method for spmeshed
predict(object, newx, newcoords, 
    n_threads=4, verbose=FALSE, ...)

Value

coords_out: matrix with the prediction location coordinates (order updated after predictions).
preds_out: array of dimension ($n_{o}, q, m$) where $n_{o}$ is the number of prediction locations, $q$ is the output dimension, $m$ is the number of MCMC samples.

Arguments

object: Object output from spmeshed with option settings$saving=TRUE.
newx: matrix of covariate values at the new coordinates.
newcoords: matrix of new coordinates.
n_threads: integer number of OpenMP threads. This is ineffective if meshed was not compiled with OpenMP support.
verbose: boolean for progress messagging.
...: other arguments (unused).

Author

Michele Peruzzi michele.peruzzi@duke.edu

Details

While this function can always be used to make predictions, in most cases it is more efficient to just include the prediction locations in the main data as NA values; spmeshed will sample from the posterior predictive distribution at those locations while doing MCMC. The predict method is only recommended when all 4 of the following are true:

(1) spmeshed was run with settings$forced_grid=FALSE and

(2) the prediction locations are uniformly scattered on the domain (or rather, they are not clustered as a large empty area) and

(3) the number of prediction locations is a large portion of the number of observed data points and

(4) the prediction locations are not on a grid.

In all other cases the main spmeshed function is setup to be more efficient in automatically performing predictions during MCMC.

References

Peruzzi, M., Banerjee, S., and Finley, A.O. (2020) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, in press. tools:::Rd_expr_doi("10.1080/01621459.2020.1833889")

Peruzzi, M., Banerjee, S., Dunson, D.B., and Finley, A.O. (2021) Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis. https://arxiv.org/abs/2101.03579

Examples

Run this code


# toy example with tiny dataset and short MCMC
# on a univariate outcome
library(magrittr)
library(dplyr)
library(meshed)

set.seed(2021)

SS <- 12
n <- SS^2 # total n. locations, including missing ones

coords <- data.frame(Var1=runif(n), Var2=runif(n)) %>%
  as.matrix()

# generate data
sigmasq <- 2.3
phi <- 6
tausq <- .1
B <- c(-1,.5,1)

CC <- sigmasq * exp(-phi * as.matrix(dist(coords)))
LC <- t(chol(CC))
w <- LC %*% rnorm(n)
p <- length(B)
X <- rnorm(n * p) %>% matrix(ncol=p)
y_full <- X %*% B + w + tausq^.5 * rnorm(n)

set_missing <- rbinom(n, 1, 0.1)

simdata <- data.frame(coords,
                      y_full = y_full,
                      w_latent = w) %>%
  mutate(y_observed = ifelse(set_missing==1, NA, y_full))

# MCMC setup
mcmc_keep <- 500
mcmc_burn <- 100
mcmc_thin <- 2

y <- simdata$y_observed
ybar <- mean(y, na.rm=TRUE)

# training set
y_in <- (y-ybar)[!is.na(y)]
X_in <- X[!is.na(y),]
coords_in <- coords[!is.na(y),]

# suppose we dont want to have gridded knots
# i.e. we are fixing the MGP reference set at the observed locations
# (this may be inefficient in big data settings)
meshout <- spmeshed(y_in, X_in, coords_in,
                    axis_partition=c(4,4),
                    n_samples = mcmc_keep, 
                    n_burn = mcmc_burn, 
                    n_thin = mcmc_thin, 
                    settings = list(forced_grid=FALSE, cache=FALSE),
                    prior=list(phi=c(1,15)),
                    verbose = 0,
                    n_threads = 1)

# test set
coords_out <- coords[is.na(y),]
X_out <- X[is.na(y),]

df_predict <- predict(meshout, newx=X_out, newcoords=coords_out)

y_posterior_predictive_mean <- df_predict$preds_out[,1,] %>% 
  apply(1, mean) %>% add(ybar)
df_predicted <- df_predict$coords_out %>% cbind(y_posterior_predictive_mean)

Run the code above in your browser using DataLab