grid: An rts grid object

Description

An rts grid object

Arguments

Public fields

grid_data: sf object specifying the computational grid for the analysis
region_data: sf object specifying an irregular lattice, such as census areas, within which case counts are aggregated. Only used if polygon data are provided on class initialisation.
priors: list of prior distributions for the analysis
bobyqa_control: list of control parameters for the BOBYQA algorithm, must contain named elements any or all of npt, rhobeg, rhoend, covrhobeg, covrhoend. Only has an effect for the HSGP and NNGP approximations. The latter two parameters control the covariance parameter optimisation, while the former control the linear predictor.
boundary: sf object showing the boundary of the area of interest

Methods

Method `new()`

Create a new grid object

Produces a regular grid over an area of interest as an sf object, see details for information on initialisation.

Usage

grid$new(poly, cellsize, verbose = TRUE)

Arguments

poly: An sf object containing either one polygon describing the area of interest or multiple polygons representing survey or census regions in which the case data counts are aggregated

cellsize

The dimension of the grid cells

verbose

Logical indicating whether to provide feedback to the console.

Returns

NULL

Examples

# a simple example with a square and a small number of cells
# this same running example is used for the other functions 
b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
# an example with multiple polygons
data("birmingham_crime")
g2 <- grid$new(birmingham_crime,cellsize = 1000)

Method `print()`

Prints this object

Usage

grid$print()

Returns

None. called for effects.

Method `plot()`

Plots the grid data

Usage

grid$plot(zcol)

Arguments

zcol: Vector of strings specifying names of columns of grid_data to plot

Returns

A plot

Examples

b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$plot()
# a plot with covariates - we simulate covariates first
g1$grid_data$cov <- stats::rnorm(nrow(g1$grid_data))
g1$plot("cov")

Method `points_to_grid()`

Generates case counts of points over the grid

Counts the number of cases in each time period in each grid cell

Usage

grid$points_to_grid(
  point_data,
  t_win = c("day"),
  laglength = 14,
  date_format = "ymd",
  verbose = TRUE
)

Arguments

point_data: sf object describing the point location of cases with a column t of the date of the case in YYYY-MM-DD format. See create_points

t_win

character string. One of "day", "week", or "month" indicating the length of the time windows in which to count cases

laglength

integer The number of time periods to include counting back from the most recent time period

date_format

String describing the format of the date in the data as a combination of "d" days, "m" months, and "y" years, either "dmy", "myd", "ymd", "ydm", "dym" "mdy" as used by the lubridate package.

verbose

Logical indicating whether to report detailed output

Returns

NULL

Examples

b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
# simulate some points
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20)) 
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)

Method `model_data_frame()`

Returns a data frame with the grid data and coordinates

Returns a standard data frame with the grid data and coordinates, which may be useful to run models in another package.

Usage

grid$model_data_frame()

Method `add_covariates()`

Adds covariate data to the grid

Maps spatial, temporal, or spatio-temporal covariate data onto the grid using different methods.

Usage

grid$add_covariates(
  cov_data,
  zcols,
  weight_type = "area",
  popdens = NULL,
  t_label = NULL,
  flat = TRUE,
  lambda_smooth = 1,
  lambda_e = 1e-06,
  is_positive = FALSE
)

Arguments

cov_data: sf, RasterLayer, SpatRaster object or a data.frame. See details.

zcols

vector of character strings with the names of the columns of cov_data to include

weight_type

character string. Either "area" for area-weighted average or "pop" for population-weighted average

popdens

character string. The name of the column in cov_data with the population density. Required if weight_type="pop"

t_label

integer. If adding spatio-temporally varying data by time period, this time label should be appended to the column name. See details.

flat

Logical indicating if the disaggregation should be flat and just a weighted average over intersections. Cannot be strictly positive.

lambda_smooth

weight on spatial smoothness, used if flat is FALSE

lambda_e

small ridge for numerical stability (needed because L is singular), or for strictly positive covariates, the weight on entropy in the minimisation.

is_positive

Logical. Should the disaggregation be strictly positive.

verbose

logical. Whether to provide a progress bar

Returns

NULL

Examples

b1 <-  sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
\donttest{
# mapping population data from some other polygons
data("boundary")
data("birmingham_crime")
g2 <- grid$new(boundary,cellsize=0.008)
msoa <- sf::st_transform(birmingham_crime,crs = 4326)
suppressWarnings(sf::st_crs(msoa) <- sf::st_crs(g2$grid_data)) # ensure crs matches
g2$add_covariates(msoa,
                  zcols="pop",
                  weight_type="area")
                  
# add a case count                  
g2$add_covariates(msoa,
                  zcols=c("t1"),
                  weight_type="area",
                  is_positive = TRUE,
                  lambda_smooth = 0,
                  lambda_e = 1e-6)
g2$grid_data$t1 <- round(g2$grid_data$t1,0)
g2$plot("pop")
}

Method `get_dow()`

Generate day of week data

Create data frame with day of week indicators

Generates a data frame with indicator variables for each day of the week for use in the add_covariates() function.

Usage

grid$get_dow()

Returns

data.frame with columns t, day, and dayMon to daySun

Examples

b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)
dow <- g1$get_dow()
g1$add_covariates(dow,zcols = colnames(dow)[3:ncol(dow)])

Method `add_time_indicators()`

Adds time period indicators to the data

Adds indicator variables for each time period to the data. To include these in a model fitting procedure use, for example, covs = c("time1i, time2i,...)

Usage

grid$add_time_indicators()

Returns

Nothing. Called for effects.

Method `lgcp_bayes()`

Fit an (approximate) log-Gaussian Cox Process model using Bayesian methods

Usage

grid$lgcp_bayes(
  popdens = NULL,
  covs = NULL,
  covs_grid = NULL,
  approx = "nngp",
  m = 10,
  L = 1.5,
  model = "exp",
  known_theta = NULL,
  iter_warmup = 500,
  iter_sampling = 500,
  chains = 3,
  parallel_chains = 3,
  verbose = TRUE,
  vb = FALSE,
  return_stan_fit = FALSE,
  ...
)

Arguments

popdens: character string. Name of the population density column

covs

vector of character string. Base names of the covariates to include. For temporally-varying covariates only the stem is required and not the individual column names for each time period (e.g. dayMon and not dayMon1, dayMon2, etc.)

covs_grid

If using a region model, covariates at the level of the grid can also be specified by providing their names to this argument.

approx

Either "rank" for reduced rank approximation, or "nngp" for nearest neighbour Gaussian process.

m

integer. Number of basis functions for reduced rank approximation, or number of nearest neighbours for nearest neighbour Gaussian process. See Details.

L

integer. For reduced rank approximation, boundary condition as proportionate extension of area, e.g. L=2 is a doubling of the analysis area. See Details.

model

Either "exp" for exponential covariance function or "sqexp" for squared exponential covariance function

known_theta

An optional vector of two values of the covariance parameters. If these are provided then the covariance parameters are assumed to be known and will not be estimated.

iter_warmup

integer. Number of warmup iterations

iter_sampling

integer. Number of sampling iterations

chains

integer. Number of chains

parallel_chains

integer. Number of parallel chains

verbose

logical. Provide feedback on progress

vb

Logical indicating whether to use variational Bayes (TRUE) or full MCMC sampling (FALSE)

return_stan_fit

logical. The results of the model fit are stored internally as an rstFit object and returned in that format. If this argument is set to TRUE, then the fitted stan object will instead be returned, but the rtsFit object will still be saved.

...

additional options to pass to `$sample()``.

priors

list. See Details

Returns

A stanfit or a CmdStanMCMC object

Examples

# the data are just random simulated points 
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(0),
  prior_linpred_sd=c(5)
  )
\donttest{
g1$lgcp_bayes(popdens="cov", approx = "hsgp", parallel_chains = 0)
g1$model_fit()
# we can extract predictions
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr")
 # this example uses real aggregated data but will take a relatively long time to run
 data("birmingham_crime")
 example_data <- birmingham_crime[,c(1:8,21)]
 example_data$y <- birmingham_crime$t12
 g2 <- grid$new(example_data,cellsize=1000)
 g2$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(-3),
  prior_linpred_sd=c(5)
)
g2$lgcp_bayes(popdens="pop", approx = "hsgp", parallel_chains = 0)
g2$model_fit()
g2$extract_preds("rr")
g2$plot("rr")
g2$hotspots(threshold = 2, stat = "rr")
}

Method `lgcp_ml()`

Fit an (approximate) log-Gaussian Cox Process model using Maximum Likelihood

Usage

grid$lgcp_ml(
  popdens = NULL,
  covs = NULL,
  model = "fexp",
  max_iter = 30,
  iter_sampling = 200,
  tol = 10,
  hist = 5,
  k0 = 10,
  start_theta = NULL,
  trace = 1
)

Arguments

popdens: character string. Name of the population density column in grid data or region data

covs

vector of strings. Base names of the covariates to include. For temporally-varying covariates only the stem is required and not the individual column names for each time period (e.g. dayMon and not dayMon1, dayMon2, etc.) For regional models, covariates should be mapped to the grid currently (see add_covariates)

model

Either "fexp", "sqexp", "matern1", or "matern2" for exponential, squared exponential, and matern with shape of 1 or 2, respectively. Other functions from glmmrBase will work, but may be less relevant to spatial models.

max_iter

Integer. Maximum number of iterations.

iter_sampling

integer. Number of random effects samples to draw on each iteration.

tol

Scalar. The tolerance for the Bayes Factor convergence criterion.

hist

Integer. The length of the running mean for the convergence criterion for non-Gaussian models.

k0

Integer. The expected nunb2mber of iterations until convergence.

start_theta

Optional. Starting values for the covariance parameters (log(tau sq), log(lambda), rho), with rho only required if more than one time period.

trace

Integer. Level of detail of information printed to the console. 0 = none, 1 = some (default), 2 = most.

Returns

Optionally, an rtsFit model fit object. This fit is stored internally and can be retrieved with model_fit()

Examples

# note: these examples are spatial only in 0.10.0 to prevent an error 
# with the reverse dependency check when updating the base package.
# If you're seeing this note, then an updated package will be available
# imminently.
# a simple example with completely random points
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3))
dp <- create_points(dp,pos_vars = c('y','x'))
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp)
# an example 
\donttest{
g1$lgcp_ml(popdens="cov",iter_sampling = 50)
g1$model_fit()
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr")
# this example uses real aggregated spatial data
# note that the full dataset has 12 time periods
# and can be used as a spatio-temporal example by removing
# the lines marked # spatial 
 data("birmingham_crime")
 example_data <- birmingham_crime[,c(1:8,21)] # spatial
 example_data$y <- birmingham_crime$t12 # spatial
 example_data <- sf::st_transform(example_data, crs = 4326)
 g2 <- grid$new(example_data,cellsize=0.006)
 g2$lgcp_ml(popdens = "pop", start_theta = log(c(0.1, 0.05)))
 g2$model_fit()
 g2$extract_preds("rr")
 g2$plot("rr")
 g2$hotspots(threshold = 2, stat = "rr") 
}

Method `extract_preds()`

Extract predictions

Extract incidence and relative risk predictions. The predictions will be extracted from the last model fit. If no previous model fit then use either lgcp_ml() or lgcp_bayes(), or see model_fit() to update the stored model fit.

Usage

grid$extract_preds(
  type = c("pred", "rr", "irr"),
  irr.lag = NULL,
  t.lag = 0,
  popdens = NULL
)

Arguments

type: Vector of character strings. Any combination of "pred", "rr", and "irr", which are, posterior mean incidence (overall and population standardised), relative risk, and incidence rate ratio, respectively.

irr.lag

integer. If "irr" is requested as type then the number of time periods lag previous the ratio is in comparison to

t.lag

integer. Extract predictions for previous time periods.

popdens

character string. Name of the column in grid_data with the population density data

Returns

NULL

Examples

# See examples for lgcp_bayes() and lgcp_ml()

Method `hotspots()`

Generate hotspot probabilities

Generate hotspot probabilities. The last model fit will be used to extract predictions. If no previous model fit then use either lgcp_ml() or lgcp_bayes(), or see model_fit() to update the stored model fit.

Given a definition of a hotspot in terms of threshold(s) for incidence, relative risk, or incidence rate ratio, returns the probabilities each area is a "hotspot". See Details of extract_preds. Columns will be added to grid_data. Note that for incidence threshold, the threshold should be specified as the per individual incidence.

Usage

grid$hotspots(threshold = NULL, stat = "rr", t.lag = 0, popdens = NULL)

Arguments

threshold: Numeric. Threshold of population standardised incidence above which an area is a hotspot

stat

Numeric. Threshold of incidence rate ratio above which an area is a hotspot.

t.lag

integer. Extract predictions for incidence or relative risk for previous time periods.

popdens

character string. Name of variable in grid_data specifying the population density. Needed if incidence.threshold is not NULL

Returns

None, called for effects. Columns are added to grid or region data.

Examples

\dontrun{
# See examples for lgcp_bayes() and lgcp_ml()
}

Method `aggregate_output()`

Aggregate output

Aggregate lgcp_fit output to another geography

Usage

grid$aggregate_output(
  new_geom,
  zcols,
  weight_type = "area",
  popdens = NULL,
  verbose = TRUE
)

Arguments

new_geom: sf object. A set of polygons covering the same area as boundary

zcols

vector of character strings. Names of the variables in grid_data to map to the new geography

weight_type

character string, either "area" or "pop" for area-weighted or population weighted averaging, respectively, or "sum" to take the weighted sum.

popdens

character string. If weight_type is equal to "pop" then the name of the column in grid_data with population density data

verbose

logical. Whether to provide progress bar.

Returns

An sf object identical to new_geom with additional columns with the variables specified in zcols

Examples

\donttest{
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(0),
  prior_linpred_sd=c(5)
  )
res <- g1$lgcp_bayes(popdens="cov", parallel_chains = 1)
g1$extract_preds(res,
                 type=c("pred","rr"),
                 popdens="cov")
new1 <- g1$aggregate_output(cov1$grid_data,
                            zcols="rr")
}

Method `scale_conversion_factor()`

Returns scale conversion factor

Coordinates are scaled to [-1,1] for LGCP models fit with HSGP. This function returns the scaling factor for this conversion.

Usage

grid$scale_conversion_factor()

Returns

numeric

Examples

b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$scale_conversion_factor()

Method `get_region_data()`

Returns summary data of the region/grid intersections

Information on the intersection between the region areas and the computational grid including the number of cells intersecting each region (n_cell), the indexes of the cells intersecting each region in order (cell_id), and the proportion of each region's area covered by each intersecting grid cell (q_weights).

Usage

grid$get_region_data()

Returns

A named list

Method `variogram()`

Plots the empirical semi-variogram

Usage

grid$variogram(popdens, yvar, nbins = 20)

Arguments

popdens: String naming the variable in the data specifying the offset. If not provided then no offset is used.

yvar

String naming the outcome variable to calculate the variogram for. Optional, if not provided then the outcome count data will be used.

nbins

The number of bins in the empirical semivariogram

Returns

A ggplot plot is printed and optionally returned

Method `reorder()`

Re-orders the computational grid

The quality of the nearest neighbour approximation can depend on the ordering of the grid cells. This function reorders the grid cells. If this is a region data model, then the intersections are recomputed.

Usage

grid$reorder(option = "y", verbose = TRUE)

Arguments

option: Either "y" for order of the y coordinate, "x" for order of the x coordinate, "minimax" in which the next observation in the order is the one which maximises the minimum distance to the previous observations, or "random" which randomly orders them.

verbose

Logical indicating whether to print a progress bar (TRUE) or not (FALSE).

Returns

No return, used for effects.

Method `data()`

A list of prepared data

The class prepares data for use in the in-built estimation functions. The same data could be used for alternative models. This is a utility function to facilitate model fitting for custom models.

Usage

grid$data(m, approx, popdens, covs, covs_grid)

Arguments

m: The number of nearest neighbours or basis functions.

approx

Either "rank" for reduced rank approximation, or "nngp" for nearest neighbour Gaussian process.

popdens

String naming the variable in the data specifying the offset. If not provided then no offset is used.

covs

An optional vector of covariate names. For regional data models, this is specifically for the region-level covariates.

covs_grid

An optional vector of covariate names for region data models, identifying the covariates at the grid level.

Returns

A named list of data items used in model fitting

Method `get_random_effects()`

Returns the random effects stored in the object (if any) after using ML fitting. It's main use is if a fitting procedure is stopped, the random effects can still be returned.

Usage

grid$get_random_effects()

Returns

A matrix of random effects samples if a MCMCML model has been initialised, otherwise returns FALSE

Method `model_fit()`

Either returns the stored last model fit with either lgcp_ml or lgcp_bayes, or updates the saved model fit if an object is provided.

Usage

grid$model_fit(fit = NULL)

Arguments

fit: Optional. A previous rtsFit object. If provided then the function updates the internally stored model fit.

Returns

Either a rtsFit object or nothing if no model has been previously fit, or if the fit is updated.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

grid$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Details

An rts grid object is an R6 class holding the spatial data with data, model fitting, and analysis functions.

INTRODUCTION

The various methods of the class include examples and details of their implementation. The sf package is used for all spatial data. A typical workflow with this class would be:

Create a new grid object. The class is initialized with either a single polygon describing the area of interest or a collection of polygons if spatially aggregated data are used.
If the location (and times) of cases are available (i.e. the data are not spatially aggregated), then we map the points to the computational grid. The function create_points can generate point data in the correct sf format. The member function points_to_grid will then map these data to the grid. Counts can also be manually added to grid data. For region data, since the counts are assumed to be already aggregated, these must be manually provided by the user. The case counts must appear in columns with specific names. If there is only a single time period then the counts must be in a column named y. If there are multiple time periods then the counts must be in columns names t1, t2, t3,... Associated columns labelled date1, date2, etc. will permit use of some functionality regarding specific time intervals.
If any covariates are to be used for the modelling, then these can be mapped to the compuational grid using the function add_covariates(). Other functions, add_time_indicators() and get_dow() will also generate relevant temporal indicators where required. At a minimum we would recommend including a measure of population density.
Fit a model. There are multiple methods for model fitting, which are available through the member functions lgcp_ml() and lgcp_bayes() for maximum likelihood and Bayesian approaches, respectively. The results are stored internally and optionally returned as a rtsFit object.
Summarise the output. The main functions for summarising the output are extract_preds(), which will generate predictions of relative risk, incidence rate ratios, and predicted incidence, and hotspots(), which will estimate probabilities that these statistics exceed given thresholds. For spatially-aggregated data models, the relative risk applies to the grid, whereas rate ratios and predicted incidence applies to the areas.
Predictions can be visualised or aggregated to relevant geographies with the plot() and aggregate() functions.

Specific details of the implementation of each of these functions along with examples appear below.

PLOTTING

If zcol is not specified then only the geometry is plotted, otherwise the covariates specified will be plotted. The user can also use sf plotting functions on self$grid_data and self$region_data directly.

POINTS TO GRID

Given the sf object with the point locations and date output from create_points(), the functions will add columns to grid_data indicating the case count in each cell in each time period.

Case counts are generated for each grid cell for each time period. The user can specify the length of each time period; currently day, week, and month are supported.

The user must also specify the number of time periods to include with the laglength argument. The total number of time periods is the specified lag length counting back from the most recent case. The columns in the output will be named t1, t2,... up to the lag length, where the highest number is the most recent period.

ADDING COVARIATES

Spatially-varying data only

cov_data is an object describing covariate over the area of interest. sf, RasterLayer and SpatRaster objects are supported, with rasters converted internally to sf. The mapping can use a spatially-smoothed method (pynchophylatic) or a variance or entropy minimising method or a simple "flat" mapping using only the weighted average of intersections between the grid and covariate polygons. For non-"flat" mapping, lambda_smooth controls the degree of spatial smoothing - if set to zero then no spatial smoothing is used. The argument lambda_e adds a small amount to reduce numerical instability. One can also map strictly positive covariates (e.g. population density) by setting the is_positive argument to true. In this case, lambda_e is used to add an entropy minimising criterion (instead of, or in addition to) spatial smoothing crtierion. If population density information, then this can be accounted for in the smoothing by setting weight_type to pop and specifying the name of covariate to popdens, which should be on the grid.

Temporally-varying only data

cov_data is a data frame with number of rows equal to the number of time periods. One of the columns must be called t and have values from 1 to the number of time periods. The other columns of the data frame have the values of the covariates for each time period. See get_dow() for day of week data. A total of length(zcols)*(number of time periods) columns are added to the output: for each covariate there will be columns appended with each time period number. For example, dayMon1, dayMon2, etc.

Spatially and temporally varying data

There are two ways to add data that vary both spatially and temporally. The final output for use in analysis must have a column for each covariate and each time period with the same name appended by the time period number, e.g. covariateA1,covariateA2,... If the covariate values for different time periods are in separate sf objects, one can follow the method for spatially-varying only data above and append the time period number using the argument t_label. If the values for different time periods are in the same sf object then they should be named as described above and then can be added as for spatially-varying covariates, e.g. zcols=c("covariateA1","covariateA2").

BAYESIAN MODEL FITTING

The grid data must contain columns t*, giving the case count in each time period (see points_to_grid), or a column y in single time period cases, as well as any covariates to include in the model (see add_covariates). If the population density is not provided it is set to one. If the data are regional data, then the outcome counts must be in self$region_data

The statistical model is a Log Gaussian cox process, whose realisation is observed on the Cartesian area of interest A and time period T. The resulting data are relaisations of an inhomogeneous Poisson process with stochastic intensity function $\{\lambda{s,t}:s\in A, t \in T\}$. We specify a log-linear model for the intensity:

$$\lambda(s,t) = r(s,t)exp(X(s,t)'\gamma + Z(s,t))$$

where r(s,t) is a spatio-temporally varying Poisson offset. X(s,t) is a length Q vector of covariates including an intercept and Z(s,t) is a latent field. We use an auto-regressive specification for the latent field, with spatial innovation in each field specified as a spatial Gaussian process.

The argument approx specifies whether to use a full LGCP model (approx='none') or whether to use either a nearest neighbour approximation (approx='nngp') or a "Hilbert space" approximation (approx='hsgp'). For full details of NNGPs see XX and for Hilbert space approximations see references (1) and (2).

Priors

For Bayesian model fitting, the priors should be provided as a list to the griddata object:

griddata$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(-5,rep(0,7)),
  prior_linpred_sd=c(3,rep(1,7))
)

where these refer to the priors: prior_lscale: the length scale parameter has a half-normal prior $N(a,b^2)I[0,\infty)$. The vector is c(a,b). prior_var: the standard deviation term has a half normal prior $\sigma ~ N(a,b^2)I[0,\infty)$. The vector is c(a,b). prior_linpred_mean and prior_linpred_sd: The parameters of the linear predictor. If X is the nT x Q matrix of covariates, with the first column as ones for the intercept, then the linear prediction contains the term $X'\gamma$. Each parameter in $\gamma$ has prior $\gamma_q ~ N(a_q,b_q^2)$. prior_linpred_mean should be the vector (a_1,a_2,...,a_Q) and prior_linpred_sd should be (b_1,b_2,...,b_Q).

MAXIMUM LIKELIHOOD MODEL FITTING

The grid or region data must contain columns t*, giving the case count in each time period (see points_to_grid), or y if single time period, as well as any covariates to include in the model (see add_covariates). If a population density variable is not provided it is set to one. If the data are regional data then the outcome counts must be in self$region_data. See lgcp_bayes() for Bayesian approaches to model fitting and more details on the model.

Model fitting uses a fast stochastic maximum likelihood algorithms, which have three steps:

Sample random effects using MCMC. The argument mcmc_sampling specifies the iterations for this step.
Fit fixed effect parameters using a Newton-Raphson step.
Fit covariance parameters using a Newton-Raphson step.

The algorithm uses Bayes Factor termination criterion to measure the evidence of convergence. The algorithm terminates when the Bayes Factor is greater than tol (default is 10). The prior is based on the expected number of iterations until convergence, given by k0 (defaul is 10).

EXTRACTING PREDICTIONS

Three outputs can be extracted from the model fit, which will be added as columns to grid_data:

Predicted incidence: If type includes pred then pred_mean_total and pred_mean_total_sd provide the predicted mean total incidence and its standard deviation, respectively. pred_mean_pp and pred_mean_pp_sd provide the predicted population standardised incidence and its standard deviation.

Relative risk: if type includes rr then the relative risk is reported in the columns rr and rr_sd. The relative risk here is the exponential of the latent field, which describes the relative difference between expexted mean and predicted mean incidence.

Incidence risk ratio: if type includes irr then the incidence rate ratio (IRR) is reported in the columns irr and irr_sd. This is the ratio of the predicted incidence in the last period (minus t_lag) to the predicted incidence in the last period minus irr_lag (minus t_lag). For example, if the time period is in days then setting irr_lag to 7 and leaving t_lag=0 then the IRR is the relative change in incidence in the present period compared to a week prior.

References

(1) Solin A, Särkkä S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput. 2020;30:419–46. doi:10.1007/s11222-019-09886-w.

(2) Riutort-Mayol G, Bürkner P-C, Andersen MR, Solin A, Vehtari A. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput. 2023;33:17. doi:10.1007/s11222-022-10167-2.

Examples

Run this code


## ------------------------------------------------
## Method `grid$new`
## ------------------------------------------------

# a simple example with a square and a small number of cells
# this same running example is used for the other functions 
b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)

# an example with multiple polygons
data("birmingham_crime")
g2 <- grid$new(birmingham_crime,cellsize = 1000)

## ------------------------------------------------
## Method `grid$plot`
## ------------------------------------------------

b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$plot()

# a plot with covariates - we simulate covariates first
g1$grid_data$cov <- stats::rnorm(nrow(g1$grid_data))
g1$plot("cov")

## ------------------------------------------------
## Method `grid$points_to_grid`
## ------------------------------------------------

b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
# simulate some points
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20)) 
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)

## ------------------------------------------------
## Method `grid$add_covariates`
## ------------------------------------------------

b1 <-  sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")

# \donttest{
# mapping population data from some other polygons
data("boundary")
data("birmingham_crime")
g2 <- grid$new(boundary,cellsize=0.008)
msoa <- sf::st_transform(birmingham_crime,crs = 4326)
suppressWarnings(sf::st_crs(msoa) <- sf::st_crs(g2$grid_data)) # ensure crs matches
g2$add_covariates(msoa,
                  zcols="pop",
                  weight_type="area")
                  
# add a case count                  
g2$add_covariates(msoa,
                  zcols=c("t1"),
                  weight_type="area",
                  is_positive = TRUE,
                  lambda_smooth = 0,
                  lambda_e = 1e-6)

g2$grid_data$t1 <- round(g2$grid_data$t1,0)
g2$plot("pop")
# }

## ------------------------------------------------
## Method `grid$get_dow`
## ------------------------------------------------

b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)
dow <- g1$get_dow()
g1$add_covariates(dow,zcols = colnames(dow)[3:ncol(dow)])

## ------------------------------------------------
## Method `grid$lgcp_bayes`
## ------------------------------------------------

# the data are just random simulated points 
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(0),
  prior_linpred_sd=c(5)
  )
# \donttest{
g1$lgcp_bayes(popdens="cov", approx = "hsgp", parallel_chains = 0)
g1$model_fit()
# we can extract predictions
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr")

 # this example uses real aggregated data but will take a relatively long time to run
 data("birmingham_crime")
 example_data <- birmingham_crime[,c(1:8,21)]
 example_data$y <- birmingham_crime$t12
 g2 <- grid$new(example_data,cellsize=1000)
 g2$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(-3),
  prior_linpred_sd=c(5)
)
g2$lgcp_bayes(popdens="pop", approx = "hsgp", parallel_chains = 0)
g2$model_fit()
g2$extract_preds("rr")
g2$plot("rr")
g2$hotspots(threshold = 2, stat = "rr")
# }

## ------------------------------------------------
## Method `grid$lgcp_ml`
## ------------------------------------------------

# note: these examples are spatial only in 0.10.0 to prevent an error 
# with the reverse dependency check when updating the base package.
# If you're seeing this note, then an updated package will be available
# imminently.
# a simple example with completely random points
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3))
dp <- create_points(dp,pos_vars = c('y','x'))
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp)

# an example 

# \donttest{
g1$lgcp_ml(popdens="cov",iter_sampling = 50)
g1$model_fit()
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr")

# this example uses real aggregated spatial data
# note that the full dataset has 12 time periods
# and can be used as a spatio-temporal example by removing
# the lines marked # spatial 
 data("birmingham_crime")
 example_data <- birmingham_crime[,c(1:8,21)] # spatial
 example_data$y <- birmingham_crime$t12 # spatial
 example_data <- sf::st_transform(example_data, crs = 4326)
 g2 <- grid$new(example_data,cellsize=0.006)
 g2$lgcp_ml(popdens = "pop", start_theta = log(c(0.1, 0.05)))
 g2$model_fit()
 g2$extract_preds("rr")
 g2$plot("rr")
 g2$hotspots(threshold = 2, stat = "rr") 
# }


## ------------------------------------------------
## Method `grid$extract_preds`
## ------------------------------------------------

# See examples for lgcp_bayes() and lgcp_ml()

## ------------------------------------------------
## Method `grid$hotspots`
## ------------------------------------------------

if (FALSE) {
# See examples for lgcp_bayes() and lgcp_ml()
}

## ------------------------------------------------
## Method `grid$aggregate_output`
## ------------------------------------------------

# \donttest{
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
                  zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
  prior_lscale=c(0,0.5),
  prior_var=c(0,0.5),
  prior_linpred_mean=c(0),
  prior_linpred_sd=c(5)
  )
res <- g1$lgcp_bayes(popdens="cov", parallel_chains = 1)
g1$extract_preds(res,
                 type=c("pred","rr"),
                 popdens="cov")
new1 <- g1$aggregate_output(cov1$grid_data,
                            zcols="rr")
# }

## ------------------------------------------------
## Method `grid$scale_conversion_factor`
## ------------------------------------------------

b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$scale_conversion_factor()

Run the code above in your browser using DataLab

Description

Arguments

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Examples

Method print()

Usage

Returns

Method plot()

Usage

Arguments

Returns

Examples

Method points_to_grid()

Usage

Arguments

Returns

Examples

Method model_data_frame()

Usage

Method add_covariates()

Usage

Arguments

Returns

Examples

Method get_dow()

Usage

Returns

Examples

Method add_time_indicators()

Usage

Returns

Method lgcp_bayes()

Usage

Arguments

Returns

Examples

Method lgcp_ml()

Usage

Arguments

Returns

Examples

Method extract_preds()

Usage

Arguments

Returns

Examples

Method hotspots()

Usage

Arguments

Returns

Examples

Method aggregate_output()

Usage

Arguments

Returns

Examples

Method scale_conversion_factor()

Usage

Returns

Examples

Method get_region_data()

Usage

Returns

Method variogram()

Usage

Arguments

Returns

Method reorder()

Usage

Arguments

Returns

Method data()

Usage

Arguments

Method `new()`

Method `print()`

Method `plot()`

Method `points_to_grid()`

Method `model_data_frame()`

Method `add_covariates()`

Method `get_dow()`

Method `add_time_indicators()`

Method `lgcp_bayes()`

Method `lgcp_ml()`

Method `extract_preds()`

Method `hotspots()`

Method `aggregate_output()`

Method `scale_conversion_factor()`

Method `get_region_data()`

Method `variogram()`

Method `reorder()`

Method `data()`

Method `get_random_effects()`

Method `model_fit()`

Method `clone()`