Method new()
Create a new grid object
Produces a regular grid over an area of interest as an sf object, see details for information on initialisation.
Usage
grid$new(poly, cellsize, verbose = TRUE)
Arguments
poly
An sf object containing either one polygon describing the area of interest or multiple polygons
representing survey or census regions in which the case data counts are aggregated
cellsize
The dimension of the grid cells
verbose
Logical indicating whether to provide feedback to the console.
Examples
# a simple example with a square and a small number of cells
# this same running example is used for the other functions
b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)# an example with multiple polygons
data("birmingham_crime")
g2 <- grid$new(birmingham_crime,cellsize = 1000)
Returns
None. called for effects.
Arguments
zcol
Vector of strings specifying names of columns of grid_data to plot
Examples
b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$plot()# a plot with covariates - we simulate covariates first
g1$grid_data$cov <- stats::rnorm(nrow(g1$grid_data))
g1$plot("cov")
Method points_to_grid()
Generates case counts of points over the grid
Counts the number of cases in each time period in each grid cell
Usage
grid$points_to_grid(
point_data,
t_win = c("day"),
laglength = 14,
date_format = "ymd",
verbose = TRUE
)
Arguments
point_data
sf object describing the point location of cases with a column
t of the date of the case in YYYY-MM-DD format. See create_points
t_win
character string. One of "day", "week", or "month" indicating the
length of the time windows in which to count cases
laglength
integer The number of time periods to include counting back from the most
recent time period
date_format
String describing the format of the date in the data as a combination of "d" days, "m" months,
and "y" years, either "dmy", "myd", "ymd", "ydm", "dym" "mdy" as used by the lubridate package.
verbose
Logical indicating whether to report detailed output
Examples
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
# simulate some points
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)
Method model_data_frame()
Returns a data frame with the grid data and coordinates
Returns a standard data frame with the grid data and coordinates, which may be useful to
run models in another package.
Usage
grid$model_data_frame()
Method add_covariates()
Adds covariate data to the grid
Maps spatial, temporal, or spatio-temporal covariate data onto the grid using different methods.
Usage
grid$add_covariates(
cov_data,
zcols,
weight_type = "area",
popdens = NULL,
t_label = NULL,
flat = TRUE,
lambda_smooth = 1,
lambda_e = 1e-06,
is_positive = FALSE
)
Arguments
cov_data
sf, RasterLayer, SpatRaster object or a data.frame. See details.
zcols
vector of character strings with the names of the columns of cov_data
to include
weight_type
character string. Either "area" for area-weighted average or "pop"
for population-weighted average
popdens
character string. The name of the column in cov_data with the
population density. Required if weight_type="pop"
t_label
integer. If adding spatio-temporally varying data by time period,
this time label should be appended to the column name. See details.
flat
Logical indicating if the disaggregation should be flat and just a weighted average over intersections. Cannot be strictly positive.
lambda_smooth
weight on spatial smoothness, used if flat is FALSE
lambda_e
small ridge for numerical stability (needed because L is singular), or for
strictly positive covariates, the weight on entropy in the minimisation.
is_positive
Logical. Should the disaggregation be strictly positive.
verbose
logical. Whether to provide a progress bar
Examples
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
zcols="cov")\donttest{
# mapping population data from some other polygons
data("boundary")
data("birmingham_crime")
g2 <- grid$new(boundary,cellsize=0.008)
msoa <- sf::st_transform(birmingham_crime,crs = 4326)
suppressWarnings(sf::st_crs(msoa) <- sf::st_crs(g2$grid_data)) # ensure crs matches
g2$add_covariates(msoa,
zcols="pop",
weight_type="area")
# add a case count
g2$add_covariates(msoa,
zcols=c("t1"),
weight_type="area",
is_positive = TRUE,
lambda_smooth = 0,
lambda_e = 1e-6)
g2$grid_data$t1 <- round(g2$grid_data$t1,0)
g2$plot("pop")
}
Method get_dow()
Generate day of week data
Create data frame with day of week indicators
Generates a data frame with indicator
variables for each day of the week for use in the add_covariates() function.
Returns
data.frame with columns t, day, and dayMon to daySun
Examples
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
g1$points_to_grid(dp, laglength=5)
dow <- g1$get_dow()
g1$add_covariates(dow,zcols = colnames(dow)[3:ncol(dow)])
Method add_time_indicators()
Adds time period indicators to the data
Adds indicator variables for each time period to the data. To include
these in a model fitting procedure use, for example, covs = c("time1i, time2i,...)
Usage
grid$add_time_indicators()
Returns
Nothing. Called for effects.
Method lgcp_bayes()
Fit an (approximate) log-Gaussian Cox Process model using Bayesian methods
Usage
grid$lgcp_bayes(
popdens = NULL,
covs = NULL,
covs_grid = NULL,
approx = "nngp",
m = 10,
L = 1.5,
model = "exp",
known_theta = NULL,
iter_warmup = 500,
iter_sampling = 500,
chains = 3,
parallel_chains = 3,
verbose = TRUE,
vb = FALSE,
return_stan_fit = FALSE,
...
)
Arguments
popdens
character string. Name of the population density column
covs
vector of character string. Base names of the covariates to
include. For temporally-varying covariates only the stem is required and not
the individual column names for each time period (e.g. dayMon and not dayMon1,
dayMon2, etc.)
covs_grid
If using a region model, covariates at the level of the grid can also be specified by providing their
names to this argument.
approx
Either "rank" for reduced rank approximation, or "nngp" for nearest
neighbour Gaussian process.
m
integer. Number of basis functions for reduced rank approximation, or
number of nearest neighbours for nearest neighbour Gaussian process. See Details.
L
integer. For reduced rank approximation, boundary condition as proportionate extension of area, e.g.
L=2 is a doubling of the analysis area. See Details.
model
Either "exp" for exponential covariance function or "sqexp" for squared exponential
covariance function
known_theta
An optional vector of two values of the covariance parameters. If these are provided
then the covariance parameters are assumed to be known and will not be estimated.
iter_warmup
integer. Number of warmup iterations
iter_sampling
integer. Number of sampling iterations
chains
integer. Number of chains
parallel_chains
integer. Number of parallel chains
verbose
logical. Provide feedback on progress
vb
Logical indicating whether to use variational Bayes (TRUE) or full MCMC sampling (FALSE)
return_stan_fit
logical. The results of the model fit are stored internally as an rstFit object and
returned in that format. If this argument is set to TRUE, then the fitted stan object will instead be returned,
but the rtsFit object will still be saved.
...
additional options to pass to `$sample()``.
priors
list. See Details
Returns
A stanfit or a CmdStanMCMC object
Examples
# the data are just random simulated points
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
prior_lscale=c(0,0.5),
prior_var=c(0,0.5),
prior_linpred_mean=c(0),
prior_linpred_sd=c(5)
)
\donttest{
g1$lgcp_bayes(popdens="cov", approx = "hsgp", parallel_chains = 0)
g1$model_fit()
# we can extract predictions
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr") # this example uses real aggregated data but will take a relatively long time to run
data("birmingham_crime")
example_data <- birmingham_crime[,c(1:8,21)]
example_data$y <- birmingham_crime$t12
g2 <- grid$new(example_data,cellsize=1000)
g2$priors <- list(
prior_lscale=c(0,0.5),
prior_var=c(0,0.5),
prior_linpred_mean=c(-3),
prior_linpred_sd=c(5)
)
g2$lgcp_bayes(popdens="pop", approx = "hsgp", parallel_chains = 0)
g2$model_fit()
g2$extract_preds("rr")
g2$plot("rr")
g2$hotspots(threshold = 2, stat = "rr")
}
Method lgcp_ml()
Fit an (approximate) log-Gaussian Cox Process model using Maximum Likelihood
Usage
grid$lgcp_ml(
popdens = NULL,
covs = NULL,
model = "fexp",
max_iter = 30,
iter_sampling = 200,
tol = 10,
hist = 5,
k0 = 10,
start_theta = NULL,
trace = 1
)
Arguments
popdens
character string. Name of the population density column in grid data or region data
covs
vector of strings. Base names of the covariates to
include. For temporally-varying covariates only the stem is required and not
the individual column names for each time period (e.g. dayMon and not dayMon1,
dayMon2, etc.) For regional models, covariates should be mapped to the grid currently (see add_covariates)
model
Either "fexp", "sqexp", "matern1", or "matern2" for exponential, squared exponential, and matern with shape of 1 or 2, respectively. Other functions
from glmmrBase will work, but may be less relevant to spatial models.
max_iter
Integer. Maximum number of iterations.
iter_sampling
integer. Number of random effects samples to draw on each iteration.
tol
Scalar. The tolerance for the Bayes Factor convergence criterion.
hist
Integer. The length of the running mean for the convergence criterion for non-Gaussian models.
k0
Integer. The expected nunb2mber of iterations until convergence.
start_theta
Optional. Starting values for the covariance parameters (log(tau sq), log(lambda), rho), with rho only
required if more than one time period.
trace
Integer. Level of detail of information printed to the console. 0 = none, 1 = some (default), 2 = most.
Returns
Optionally, an rtsFit model fit object. This fit is stored internally and can be retrieved with model_fit()
Examples
# note: these examples are spatial only in 0.10.0 to prevent an error
# with the reverse dependency check when updating the base package.
# If you're seeing this note, then an updated package will be available
# imminently.
# a simple example with completely random points
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3))
dp <- create_points(dp,pos_vars = c('y','x'))
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
zcols="cov")
g1$points_to_grid(dp)# an example
\donttest{
g1$lgcp_ml(popdens="cov",iter_sampling = 50)
g1$model_fit()
g1$extract_preds("rr")
g1$plot("rr")
g1$hotspots(threshold = 2, stat = "rr")
# this example uses real aggregated spatial data
# note that the full dataset has 12 time periods
# and can be used as a spatio-temporal example by removing
# the lines marked # spatial
data("birmingham_crime")
example_data <- birmingham_crime[,c(1:8,21)] # spatial
example_data$y <- birmingham_crime$t12 # spatial
example_data <- sf::st_transform(example_data, crs = 4326)
g2 <- grid$new(example_data,cellsize=0.006)
g2$lgcp_ml(popdens = "pop", start_theta = log(c(0.1, 0.05)))
g2$model_fit()
g2$extract_preds("rr")
g2$plot("rr")
g2$hotspots(threshold = 2, stat = "rr")
}
Arguments
type
Vector of character strings. Any combination of "pred", "rr", and "irr", which are,
posterior mean incidence (overall and population standardised), relative risk,
and incidence rate ratio, respectively.
irr.lag
integer. If "irr" is requested as type then the number of time
periods lag previous the ratio is in comparison to
t.lag
integer. Extract predictions for previous time periods.
popdens
character string. Name of the column in grid_data with the
population density data
Examples
# See examples for lgcp_bayes() and lgcp_ml()
Method hotspots()
Generate hotspot probabilities
Generate hotspot probabilities. The last model fit will be used to extract
predictions. If no previous model fit then use either lgcp_ml() or lgcp_bayes(), or see
model_fit() to update the stored model fit.
Given a definition of a hotspot in terms of threshold(s) for incidence,
relative risk, or incidence rate ratio, returns the probabilities
each area is a "hotspot". See Details of extract_preds. Columns
will be added to grid_data. Note that for incidence threshold, the threshold should
be specified as the per individual incidence.
Usage
grid$hotspots(threshold = NULL, stat = "rr", t.lag = 0, popdens = NULL)
Arguments
threshold
Numeric. Threshold of population standardised incidence
above which an area is a hotspot
stat
Numeric. Threshold of incidence rate ratio
above which an area is a hotspot.
t.lag
integer. Extract predictions for incidence or relative risk for previous time periods.
popdens
character string. Name of variable in grid_data
specifying the population density. Needed if incidence.threshold is not
NULL
Returns
None, called for effects. Columns are added to grid or region data.
Examples
\dontrun{
# See examples for lgcp_bayes() and lgcp_ml()
}
Method aggregate_output()
Aggregate output
Aggregate lgcp_fit output to another geography
Usage
grid$aggregate_output(
new_geom,
zcols,
weight_type = "area",
popdens = NULL,
verbose = TRUE
)
Arguments
new_geom
sf object. A set of polygons covering the same area as boundary
zcols
vector of character strings. Names of the variables in grid_data to
map to the new geography
weight_type
character string, either "area" or "pop" for area-weighted
or population weighted averaging, respectively, or "sum" to take the weighted sum.
popdens
character string. If weight_type is equal to "pop" then the
name of the column in grid_data with population density data
verbose
logical. Whether to provide progress bar.
Returns
An sf object identical to new_geom with additional columns with the
variables specified in zcols
Examples
\donttest{
b1 <- sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
dp <- data.frame(y=runif(10,0,3),x=runif(10,0,3),date=paste0("2021-01-",11:20))
dp <- create_points(dp,pos_vars = c('y','x'),t_var='date')
cov1 <- grid$new(b1,0.8)
cov1$grid_data$cov <- runif(nrow(cov1$grid_data))
g1$add_covariates(cov1$grid_data,
zcols="cov")
g1$points_to_grid(dp, laglength=5)
g1$priors <- list(
prior_lscale=c(0,0.5),
prior_var=c(0,0.5),
prior_linpred_mean=c(0),
prior_linpred_sd=c(5)
)
res <- g1$lgcp_bayes(popdens="cov", parallel_chains = 1)
g1$extract_preds(res,
type=c("pred","rr"),
popdens="cov")
new1 <- g1$aggregate_output(cov1$grid_data,
zcols="rr")
}
Method scale_conversion_factor()
Returns scale conversion factor
Coordinates are scaled to [-1,1] for LGCP models fit with HSGP. This function
returns the scaling factor for this conversion.
Usage
grid$scale_conversion_factor()
Examples
b1 = sf::st_sf(sf::st_sfc(sf::st_polygon(list(cbind(c(0,3,3,0,0),c(0,0,3,3,0))))))
g1 <- grid$new(b1,0.5)
g1$scale_conversion_factor()
Method get_region_data()
Returns summary data of the region/grid intersections
Information on the intersection between the region areas and the computational grid
including the number of cells intersecting each region (n_cell), the indexes of the
cells intersecting each region in order (cell_id), and the proportion of each region's
area covered by each intersecting grid cell (q_weights).
Usage
grid$get_region_data()
Method variogram()
Plots the empirical semi-variogram
Usage
grid$variogram(popdens, yvar, nbins = 20)
Arguments
popdens
String naming the variable in the data specifying the offset. If not
provided then no offset is used.
yvar
String naming the outcome variable to calculate the variogram for. Optional, if
not provided then the outcome count data will be used.
nbins
The number of bins in the empirical semivariogram
Returns
A ggplot plot is printed and optionally returned
Re-orders the computational grid
The quality of the nearest neighbour approximation can depend on the ordering of
the grid cells. This function reorders the grid cells. If this is a region data model,
then the intersections are recomputed.
Usage
grid$reorder(option = "y", verbose = TRUE)
Arguments
option
Either "y" for order of the y coordinate, "x" for order of the x coordinate,
"minimax" in which the next observation in the order is the one which maximises the
minimum distance to the previous observations, or "random" which randomly orders them.
verbose
Logical indicating whether to print a progress bar (TRUE) or not (FALSE).
Returns
No return, used for effects.
A list of prepared data
The class prepares data for use in the in-built estimation functions. The same data could be used
for alternative models. This is a utility function to facilitate model fitting for custom models.
Usage
grid$data(m, approx, popdens, covs, covs_grid)
Arguments
m
The number of nearest neighbours or basis functions.
approx
Either "rank" for reduced rank approximation, or "nngp" for nearest
neighbour Gaussian process.
popdens
String naming the variable in the data specifying the offset. If not
provided then no offset is used.
covs
An optional vector of covariate names. For regional data models, this is specifically for the region-level covariates.
covs_grid
An optional vector of covariate names for region data models, identifying the covariates at the grid level.
Returns
A named list of data items used in model fitting
Method get_random_effects()
Returns the random effects stored in the object (if any) after using ML fitting. It's main use is
if a fitting procedure is stopped, the random effects can still be returned.
Usage
grid$get_random_effects()
Returns
A matrix of random effects samples if a MCMCML model has been initialised, otherwise returns FALSE
Method model_fit()
Either returns the stored last model fit with either lgcp_ml or lgcp_bayes, or updates
the saved model fit if an object is provided.
Usage
grid$model_fit(fit = NULL)
Arguments
fit
Optional. A previous rtsFit object. If provided then the function updates the internally stored model fit.
Returns
Either a rtsFit object or nothing if no model has been previously fit, or if the fit is updated.
Method clone()
The objects of this class are cloneable with this method.
Usage
grid$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.