This function uses MCMC to draw from posterior distributions of a Bayesian spatio-temporal factor analysis model. All spatial processes use one of Fourier, thin plate spline, or multiresolution basis functions. The temporally-dependent factors use Fourier bases. The default values are chosen to work well for many data sets. Thus, it is possible to use this function using only three arguments: ymat, dates, and coords. The default number of MCMC iterations is 10000 (saving 5000); however, depending on the number of observations and processes modeled, it may need more draws than this to ensure the posterior draws are representative of the entire posterior distribution space.
BSTFA(
ymat,
dates,
coords,
iters = 10000,
n.times = nrow(ymat),
n.locs = ncol(ymat),
x = NULL,
mean = FALSE,
linear = TRUE,
seasonal = TRUE,
factors = TRUE,
n.seasn.knots = min(7, ceiling(length(unique(yday(dates)))/3)),
spatial.style = "eigen",
n.spatial.bases = min(10, ceiling(n.locs/3)),
knot.levels = 2,
max.knot.dist = mean(dist(coords)),
premade.knots = NULL,
plot.knots = FALSE,
n.factors = min(4, ceiling(n.locs/20)),
factors.fixed = NULL,
plot.factors = FALSE,
load.style = "eigen",
n.load.bases = 4,
freq.lon = 2 * diff(range(coords[, 1])),
freq.lat = 2 * diff(range(coords[, 2])),
n.temp.bases = ifelse(floor(n.times * 0.1)%%2 == 1, floor(n.times * 0.1) - 1,
floor(n.times * 0.1)),
freq.temp = n.times,
alpha.prec = 1/1e+05,
tau2.gamma = 2,
tau2.phi = 1e-07,
sig2.gamma = 2,
sig2.phi = 1e-05,
sig2 = NULL,
beta = NULL,
xi = NULL,
Fmat = matrix(0, nrow = n.times, ncol = n.factors),
Lambda = matrix(0, nrow = n.locs, n.factors),
thin = 1,
burn = floor(iters * 0.5),
verbose = TRUE,
save.missing = TRUE,
save.time = FALSE,
marginalize = FALSE
)A list containing the following elements (any elements that are the same as in the function input are removed here for brevity):
An mcmc object of size draws by n.locs containing posterior draws for the mean of each location. If mean=FALSE (default), the values will all be zero.
An mcmc object of size draws by n.spatial.bases + p containing posterior draws for the coefficients modeling the mean process. If mean=FALSE (default), the values will all be zero.
An mcmc object of size draws by 1 containing the posterior draws for the variance of the mean process. If mean=FALSE (default), the values will all be zero.
An mcmc object of size draws by n.locs containing the posterior draws for the increase/decrease (slope) across time for each location.
An mcmc object of size draws by n.spatial.bases + p containing posterior draws for the coefficients modeling the slope.
An mcmc object of size draws by 1 containing posterior draws of the variance of the slopes.
An mcmc object of size draws by n.seasn.knots*n.locs containing posterior draws for the coefficients of the seasonal process.
An mcmc object of size draws by (n.spatial.bases + p)*n.seasn.knots containing posterior draws for the coefficients modeling each coefficient of the seasonal process.
An mcmc object of size draws by n.seasn.knots containing posterior draws of the variance of the coefficients of the seasonal process.
An mcmc object of size draws by n.times*n.factors containing posterior draws of the residual factors.
An mcmc object of size draws by n.factors*n.temp.bases containing posterior draws of the coefficients for the factor temporally-dependent process.
An mcmc object of size draws by n.factors*n.locs containing posterior draws of the loadings for each location.
An mcmc object of size draws by n.factors*n.load.bases containing posterior draws of the coefficients for the loadings spatial process.
An mcmc object of size draws by 1 indicating the residual variance of the loadings spatial process.
An mcmc object of size draws by 1 containing posterior draws of the residual variance of the data.
If save.missing=TRUE, a matrix of size sum(missing) by draws containing posterior draws of the missing observations. Otherwise, the object is NULL.
A data frame of size iters by 6 containing the time it took to sample each parameter for every iteration.
An object containing the time the model setup took.
A list containing the matrices used for each modeling process. newS is the matrix of spatial basis coefficients for the mean, linear, and seasonal process coefficients. linear.Tsub is the matrix used to enforce a linear increase/increase (slope) across time. seasonal.bs.basis is the matrix containing the circular b-splines of the seasonal process. confoundingPmat.prime is the matrix that enforces orthogonality of the factors from the mean, linear, and seasonal processes. QT contains the Fourier bases used to model the temporal factors. QS contains the bases used to model the spatial loadings.
A vector of length n.factors giving the location indices of the fixed loadings.
A scalar returning the number of MCMC iterations.
An n.times*n.locs vector of the observations.
A logical vector indicating whether that element's observation was missing or not.
A numeric vector of length n.times containing the day of year for each element in the original dates.
For spatial.style='grid', a list of length knot.levels containing the coordinates for all knots at each resolution.
For load.style='grid', a list of length knot.levels containing the coordinates for all knots at each resolution.
The number of saved MCMC iterations after removing the burn-in and thinning.
Data matrix of size n.times by n.locs. Any missing data should be marked by NA. The model works best if the data are zero-centered for each location.
n.times length vector of class 'Date' corresponding to each date of the observed data. For now, the dates should be regularly spaced (e.g., daily).
n.locs by 2 matrix or data frame of coordinates for the locations of the observed data. If using longitude and latitude, longitude is assumed to be the first coordinate.
Number of MCMC iterations to draw. Default value is 10000. Function only saves (iters-burn)/thin drawn values.
Number of observations for each location. Default is nrow(ymat).
Number of observed locations. Default is ncol(ymat).
Optional n.locs by p matrix of covariates for each location. If there are no covariates, set to NULL (default).
Logical scalar. If TRUE, the model will fit a spatially-dependent mean for each location. Otherwise, the model will assume the means are zero at each location (default).
Logical scalar. If TRUE (default), the model will fit a spatially-dependent linear increase/decrease (or slope) in time. Otherwise, the model will assume a zero change in slope across time.
Logical scalar. If TRUE (default), the model will use circular b-splines to model a spatially-dependent annual process. Otherwise, the model will assume there is no seasonal (annual) process.
Logical scalar. If TRUE (default), the model will fit a spatio-temporal factor analysis model with temporally-dependent factors and spatially-dependent loadings.
Numeric scalar indicating the number of knots to use for the seasonal basis components. The default value is min(7, ceiling(length(unique(yday(dates)))/3)), where 7 will capture approximately 2 peaks during the year.
Character scalar indicating the style of bases to use for the linear and seasonal components. Style options are 'fourier' (default), 'tps' for thin plate splines, and 'grid' for multiresolution bisquare bases using knots from a grid across the space.
Numeric scalar indicating the number of spatial bases to use when spatial.style is either 'fourier' or 'tps'. Default value is min(16, ceiling(n.locs/3)). When spatial.style is 'fourier', this value must be an even square number.
Numeric scalar indicating the number of resolutions to use for when spatial.style='grid' and/or load.style='grid'. Default is 2.
Numeric scalar indicating the maximum distance at which a basis value is greater than zero when spatial.style='grid' and/or load.style='grid'. Default value is mean(dist(coords)).
Optional list of length knot.levels with each list element containing a matrix of longitude-latitude coordinates of the knots to use for each resolution when spatial.style='grid' and/or load.style='grid'. Otherwise, when premade.knots = NULL (default), the knots are determined by using the standard multiresolution grids across the space.
Logical scalar indicating whether to plot the knots used when spatial.style='grid' and/or load.style='grid'. Default is FALSE.
Numeric scalar indicating how many factors to use in the model. Default is min(4,ceiling(n.locs/20)).
Numeric vector of length n.factors indicating the locations to use for the fixed loadings. This is needed for model identifiability. If factors.fixed=NULL (default), the code will select locations with less than 20% missing data and that are far apart in the space.
Logical scalar indicating whether to plot the fixed factor locations. Default is FALSE.
Character scalar indicating the style of spatial bases to use for the spatially-dependent loadings. Options are 'fourier' (default) for the Fourier bases, 'tps' for thin plate splines, and 'grid' for multiresolution bases. This can be the same as or different than spatial.style.
Numeric scalar indicating the number of bases to use for the spatially-dependent loadings when load.style is either 'fouier' or 'tps'. This can be the same as or different than n.spatial.bases. Default is 4. When load.style='fourier', this value must be an even square number.
Numeric scalar used for spatial.style or load.style equal to 'fourier' or 'eigen'. For 'fourier', this is the frequency used for the first column of coords (assumed to be longitude) for the Fourier bases. For 'eigen', this is the range parameter of the exponential spatial correlation matrix used to create the eigenvectors. Default value is 2*diff(range(coords[,1])).
Numeric scalar used for spatial.style or load.style equal to 'fourier'. This is the frequency to use for the second column of coords (assumed to be latitude) for the Fourier bases. Default value is 2*diff(range(coords[,2])).
Numeric scalar indicating the number of Fourier bases to use for the temporally-dependent factors. The default value is 10% of n.times.
Numeric scalar indicating the frequency to use for the Fourier bases of the temporally-dependent factors. The default value is n.times.
Numeric scalar indicating the prior precision for all model process coefficients. Default value is 1/100000.
Numeric scalar indicating the prior shape for the precision of the model coefficients. Default value is 2.
Numeric scalar indicating the prior rate for the precision of the model coefficients. Default value is 1e-07.
Numeric scalar indicating the prior shape for the residual precision. Default value is 2.
Numeric scalar indicating the prior rate for the residual precision. Default value is 1e-05.
Numeric scalar indicating the starting value for the residual variance. If NULL (default), the function will select a reasonable starting value.
Numeric vector of length n.locs + p indicating starting values for the slopes. If NULL (default), the function will select reasonable starting values.
Numeric vector of length (n.locs + p)*n.seasn.knots indicating starting values for the coefficients of the seasonal component. If NULL (default), the function will select reasonable starting values.
Numeric matrix of size n.times by n.factors indicating starting values for the factors. Default value is to start all factor values at 0.
Numeric matrix of size n.locs by n.factors indicating starting values for the loadings. Default value is to start all loadings at 0.
Numeric scalar indicating how many MCMC iterations to thin by. Default value is 1, indicating no thinning.
Numeric scalar indicating how many MCMC iterations to burn before saving. Default value is one-half of iters.
Logical scalar indicating whether or not to print the status of the MCMC process. If TRUE (default), the function will print every time an additional 10% of the MCMC process is completed.
Logical scalar indicating whether or not to save the MCMC draws for the missing observations. If TRUE (default), the function will save an additional MCMC object containing the MCMC draws for each missing observation. Use FALSE to save file space and memory.
Logical scalar indicating whether to save the computation time for each MCMC iteration. Default value is FALSE. When FALSE, the function compute_summary() will not be useful.
Logical scalar indicating whether to sample hyper-coefficients by marginalizing out the corresponding parameters. Default value is FALSE. Setting to TRUE will be slower but can improve effective sample sizes.
Adam Simpson and Candace Berrett
data(utahDataList)
attach(utahDataList)
low.miss <- which(apply(is.na(TemperatureVals), 2, mean)<.02)
out <- BSTFA(ymat=TemperatureVals[1:50,low.miss],
dates=Dates[1:50],
coords=Coords[low.miss,],
n.factors=2,
iters=10)
Run the code above in your browser using DataLab