glmstarma: Fit STARMA Models based on generalized linear models

Description

The function glmstarma estimates a multivariate time series model based on generalised linear models (GLM). The primary application is for spatio-temporal data, but different applications, like time-varying network data can be ttacked by this methodology. The model framework links the mean of the time series conditional on the past, to a linear predictor. This linear predictor allows regression on past observations, past values of the linear predictor and covariates, as described in the details. Various distributions with several link-functions are available.

Usage

glmstarma(
  ts,
  model = list(),
  wlist,
  family = NULL,
  covariates = NULL,
  wlist_past_mean = NULL,
  wlist_covariates = NULL,
  control = list()
)

Value

The function returns an object of class glmstarma, which contains beside the (maybe revised) input to the function:

target_dim Number of locations. Corresponds to the number of rows in ts.
n_obs_effective Effective number of observation times. Corresponds to the number of columns in ts minus the maximum time lag of the model.
max_time_lag Maximum time lag in the model.
log_likelihood The (quasi)-log-likelihood of the fitted model, which is based on n_obs_effective observation times.
score The (quasi)-score vector at the quasi maximum likelihood estimation.
information The (quasi)-information matrix at the quasi maximum likelihood estimation.
variance_estimation The variance estimation of the parameter estimates. Calculated based on a sandwich estimator.
aic AIC of the model based on the quasi log-likelihood, see information_criteria.
bic BIC of the model based on the quasi log-likelihood, see information_criteria.
qic QIC of the model based on the quasi log-likelihood, see QIC.
design_matrix The final design matrix of the model
derivatives The derivatives of the linear predictor with respect to the parameters at each time point.
fitted.values The fitted values of the model, which can be extracted by the fitted method.
link_values Fitted values of the linear process, i.e. $\mathbf{\psi}_t$.
algorithm Information about the fitting method.
convergence A named list with information about the convergence of the optimization:
- start The values used for the coefficients at the start of the estimation.
- fncount Number of calls of the quasi-loglikelihood during optimization.
- grcount Number of calls of the quasi-score during optimization.
- hecount Number of calls of the quasi-information during optimization. In algorithms not using the information, this is 0 or the number how often constrains are evaluated.
- fitting_time The time in milliseconds it took to estimate the model.
- convergence Logical value indicating the convergence of the algorithm.
- message An optional message by the optimization algorithm.
call The function call.

Arguments

ts

Multivariate time-series. Rows indicate the locations and columns the time.

model

a named list specifying the model orders of the linear predictor, which can be of the following elements:

intercept : (Optional) character
- 'homogenous' (default) for a homogenous model, i.e. the same intercept for all components
- 'inhomogenous' for inhomogenous models, i.e. fitting an individual intercept for each component
past_obs :
- Integer vector with the maximal spatial orders for the time lags in past_obs_time_lags.
- Alternatively: a binary matrix, with the entry in row $i$ and column $j$ indicating whether the $(i - 1)$-spatial lag for the $j$-th time lag is included in the model.
past_obs_time_lags : (Optional) integer vector
- indicates the time lags for past_obs. Defaults to seq(length(past_obs)) (for vectors) and seq(ncol(past_obs)) (for a matrix)
past_mean : (Optional)
- Spatial orders for the regression on past values of (latent) linear process values.
- Values can be entered in the same format as in past_obs. If not specified, no regression to the feedback process is performed.
past_mean_time_lags : (Optional) integer vector
- Time lags for the regression on the (latent) linear process. Values can be entered in the same format as in past_obs_time_lags.
covariates : (Optional)
- spatial orders for the covariate processes passed in the argument covariates. The values can be passed as in past_obs and past_means, where the $j$-th entry or column represents the $j$-th covariate.
- Default is spatial order 0 for all covariates, which corresponds to the first matrix in argument wlist_covariates.

wlist

A list of quadratic matrices, with the same dimension as the time series has rows, which describe the spatial dependencies. Row-normalized matrices are recommended. See Details.

family

A list generated by one of the family functions of this package stfamily. This argument specifies the marginal distributions and the type of model fitted.

covariates

(Potentially named) list of covariates, containing matrices of same dimension as ts or returns of the covariate functions of this package (see TimeConstant, SpatialConstant).

wlist_past_mean

(Optional) List of matrices, which describes spatial dependencies for the values of the linear predictor. If this is NULL, the matrices from wlist are used.

wlist_covariates

(Optional) List of matrices, which describes spatial dependencies for the covariates. If this is NULL, the matrices from wlist are used.

control

A list of parameters for controlling the fitting process. This list is passed to glmstarma.control.

Details

For the time series $\{Y_t = (Y_{1,t}, \ldots, Y_{p,t})'\}$, we assume that the (marginal) conditional components $Y_{i,t} \mid \mathcal{F}_{t-1}$ follow a distribution that is a member of the exponential family. The term $\mathcal{F}_{t-1}$ denotes the history of the process up to time $t-1$. The multivariate distribution of $Y_t \mid \mathcal{F}_{t-1}$ is not necessarily identifiable. The conditional expected value $\mathbf{\mu}_t := \mathbb{E}(Y_t \mid \mathcal{F}_{t-1})$ is connected to the linear process by the link-function, i.e. $g(\mathbf{\mu}_t) = \mathbf{\psi}_t$, which is applied elementwise. The linear process has the following structure: $$\mathbf{\psi}_t = \mathbf{\delta} + \sum_{i = 1}^{q} \sum_{\ell = 0}^{a_i} \alpha_{i, \ell} W_{\alpha}^{(\ell)} h(\mathbf{\psi}_{t - i}) + \sum_{j = 1}^{r} \sum_{\ell = 0}^{b_j} \beta_{j, \ell} W_{\beta}^{(\ell)} \tilde{h}(\mathbf{Y}_{t - j}) + \sum_{k = 1}^{m} \sum_{\ell = 0}^{c_k} \gamma_{k, \ell} W_{\gamma}^{(\ell)} \mathbf{X}_{k, t},$$ where the matrices $W_{\alpha}^{(\ell)}$, $W_{\beta}^{(\ell)}$, and $W_{\gamma}^{(\ell)}$ are taken from the lists wlist_past_mean, wlist, and wlist_covariates, respectively, and $\ell$ denotes the spatial order. If $\delta = \delta_0 \mathbf{1}$ with a scalar $\delta_0$, the model is called homogenous with respect to the intercept; otherwise, it is inhomogenous. Spatial orders, intercept structure and deviating time lags are specified in the argument model. If past_mean is specified, it is also required that past_mean is specified for identifiability.

The functions $h$ and $\tilde{h}$ are set internally with the family argument. In nearly all cases, $h$ corresponds to the identity function, i.e. $h(\mathbf{\psi}_{t - i}) = \mathbf{\psi}_{t - i}$, and $\tilde{h}$ is similar to the link-function. Using count data as an example, for family = vpoisson("identity") and family = vpoisson("log") result in the linear and log-linear Poisson STARMA models from Maletz et al. (2024), where vpoisson("softplus") results in the approximately linear model by Jahn et al. (2023).

The unknown parameters $\delta$, $\alpha_{i, \ell}$, $\beta_{j, \ell}$, and $\gamma_{k, \ell}$ are estimated by quasi-maximum likelihood estimation, assuming conditional independence of the components given the past for calculating the quasi-likelihood. Parameter estimation is by default performed under stability constraints to ensure stability of the model. These constraints can be modified or deactivated via the control argument. See glmstarma.control for details.

References

Cliff, A. D., & Ord, J. K. (1975). Space-Time Modelling with an Application to Regional Forecasting. Transactions of the Institute of British Geographers, 64, 119–128. tools:::Rd_expr_doi("10.2307/621469")
Jahn, M., Weiß, C.H., & Kim, H., (2023), Approximately linear INGARCH models for spatio-temporal counts, Journal of the Royal Statistical Society Series C: Applied Statistics, 72(2), 476–497, tools:::Rd_expr_doi("10.1093/jrsssc/qlad018")
Maletz, S., Fokianos, K., & Fried, R. (2024). Spatio-Temporal Count Autoregression. Data Science in Science, 3(1), tools:::Rd_expr_doi("10.1080/26941899.2024.2425171")
Pfeifer, P. E., & Deutsch, S. J. (1980). A Three-Stage Iterative Procedure for Space-Time Modeling Phillip. Technometrics, 22(1), 35–47. tools:::Rd_expr_doi("10.2307/1268381")

Examples

Run this code

# \donttest{
dat <- load_data("chickenpox", directory = tempdir())
chickenpox <- dat$chickenpox
population_hungary <- dat$population_hungary
W_hungary <- dat$W_hungary

model_autoregressive <- list(intercept = "homogeneous", past_obs = rep(1, 7))
glmstarma(chickenpox, model_autoregressive, W_hungary, family = vpoisson("log"), 
          covariates = list(population = population_hungary), 
          control = list(parameter_init = "zero"))
# }

Run the code above in your browser using DataLab