gpdpgrow: Bayesian non-parametric dependent Gaussian process model for time-indexed functional data

Description

Estimates a collection of time-indexed functions with Gaussian process (GP) formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same GP covariance parameters. The GP formulation supports any number of additive GP covariance terms, expressing either or both of multiple trend and seasonality.

Usage

gpdpgrow(y, ipr, time_points, gp_cov, sn_order, jitter, gp_shape, gp_rate,
  noise_shape, noise_rate, dp_shape, dp_rate, M_init, lower, upper, sub_size,
  w_star, w, n.iter, n.burn, n.thin, n.tune, progress, b_move, cluster, s)

Arguments

A multivariate continuous response, specified as an N x T matrix, where N denotes the number of functions and T, the number of time points per function. Intermittent missing-at-random values are allowed and will be est

ipr

An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample

time_points

Inputs a vector of common time points at which the collections of functions were observed (with the possibility of intermittent missingness). The length of time_points should be equal to the number of columns in the data matrix, y

gp_cov

A vector of length L to select the covariance function for each of L terms. Allowed inputs are c("rq","se","sn"), where "rq" denotes the rational quadratic covariance function, "se", the square exponential, and "sn

sn_order

A vector of length L_s, the number of terms in gp_cov where "sn" is selected, that denotes the seasonality order for each term; e.g. if the two "sn" terms above are for 3 and 12 month seasonality, respectively, for

jitter

A scalar numerical value added to the diagonal elements of the T x T GP covariance matrix to stabilize computation. Defaults to jitter = 0.01.

gp_shape

The shape parameter of the Gamma base distribution for the DP prior on the P x N matrix of GP covariance parameters (where P denotes the number of parameters for each of the N experimental units). Defaults to gp_shape = 1

gp_rate

The rate parameter of the Gamma base distribution on GP covariance parameters. Defaults to gp_rate = 1.

noise_shape

The shape parameter of the Gamma base distribution on tau_e, the model noise precision parameter. Defaults to noise_shape = 3.

noise_rate

The rate parameter of the Gamma base distribution on tau_e, the model noise precision parameter. Defaults to noise_rate = 1.

dp_shape

The shape parameter for the Gamma prior on the DP concentration parameter, conc. Defaults to dp_shape = 1.

dp_rate

The rate parameter for the Gamma prior on the DP concentration parameter, conc. Defaults to dp_rate = 1.

M_init

Starting number of clusters of nrow(y) units to initialize sampler. Defaults to M_init = nrow(y).

lower

The lower end of the range to be used in conditionally sampling the GP covariance parameters (kappa,tau_e) in the slice sampler. Defaults to lower = 0.

upper

The upper end of the range to be used in conditionally sampling the GP covariance parameters (kappa,tau_e) in the slice sampler. Defaults to upper = 1e10.

sub_size

Integer vector whose length, n, equals the number of progressively coarser GP covariance matrices to use for tempered sampling steps in an alternative space to sample the GP covariance parameters. Each entry denotes the number of sub-sampl

w_star

Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces samplin auto-correlation, but increa

Numeric value denoting the step width used to construct the interval from which to draw a sample for each GP covariance parameter in the slice sampler. This value is adaptively updated in the sampler tuning stage for each parameter to be equal to the

n.iter

Total number of MCMC iterations.

n.burn

Number of MCMC iterations to discard. gpdpgrow will return (n.iter - n.burn) posterior samples.

n.thin

Gap between successive sampling iterations to save.

n.tune

Number of iterations (before ergodic chain instantiated) to adapt w, separately, for each covariance term, p = 1,...,P. Sets each {w_p} to lie in the 90 percent credible interval computed from the tuning sample (that is divide

progress

A boolean value denoting whether to display a progress bar during model execution. Defaults to progress = TRUE

b_move

A boolean value denoting whether to sample the GP function, bb, in T x 1 Gibbs steps b_move = TRUE or through elliptical slice sampling. Defaults to b_move = TRUE. Only used in the case there is any interm

cluster

A boolean value denoting whether to employ DP mix model over set of GP functions or to just use GP model with no clustering of covariance function parameters. Defaults to cluster = TRUE

An N x 1 integer vector that inputs a fixed clustering, rather than sampling it. Defaults to s = NULL

Value

S3 gpdpgrow object, for which many methods are available to return and view results. Generic functions applied to an object, res of class gpdpgrow, includes:
samples(res)contains (n.iter - n.burn) posterior sampling iterations for every model parameter
resid(res)contains the model residuals.

References

T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society). T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics. T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional Data for R. Submitted to: Journal of Statistical Software.

Examples

Run this code

{
library(growfunctions)

## load the monthly employment count data for a collection of
## U.S. states from the Current
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with
## the years 2009 - 2013
## to examine the state level employment
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2009:2013))]

## uses default setting of a single "rational quadratic" covariance
## run for 500 iterations, with half discarded as burn-in to
## obtain a more useful result.
res_gp               <- gpdpgrow(y = y_short,
                                 n.iter = 8,
                                 n.burn = 4,
                                 n.thin = 1,
                                 n.tune = 0)

## Two plots of estimated functions,
## 1. faceted by cluster
## 2. fitted functions vs noisy observations
## first plot will plot estimated denoised function,
## bb_i, for a single (randomly-selected) "state"
fit_plots_gp        <- cluster_plot( object = res_gp,
                           units_name = "state",
                           units_label = cps$st,
                           single_unit = TRUE,
                           credible = TRUE )
## second plot will randomly select 6 states
## and plot their estimated denoised functions, bb_i.
## with setting "single_unit = FALSE".
## (Option "num_plot" may be set to plot
## any integer number of
## randomly-selected units.)
fit_plots_gp        <- cluster_plot( object = res_gp,
                                     units_name = "state",
                                     units_label = cps$st,
                                     single_unit = FALSE,
                                     credible = TRUE )

}