etasclass: Mixed estimation of an ETAS model (renewed in version 2.0)

Description

etasclass is the main function of the package etasFLP.

etassclass objects of previous versions are not compatible with the current version

Performs the estimation of the components of the ETAS (Epidemic Type Aftershock Sequence) model for the description of the seismicity in a space-time region. Background seismicity is estimated non-parametrically, while triggered seismicity is estimated by MLE. In particular also the bandwidth for a kernel smoothing can be estimated through the Forward Likelihood Predictive (FLP) approach. For each event the probability of being a background event or a triggered one is estimated.

New in version 2.0.0: Covariates have been introduced to explain the effects of external factors on the induced seismicity. Since the parametrization is changed, the etasclass object created with the previous versions are not compatible with the one obtained with the current version.

New in version 2.2.0: New algoruthm for starting values. A new argument (n.iterweight) and an update method and a timeupdate option

An ETAS with up to 7+ncov parameters can be estimated, with several options and different methods.

Returns an etasclass object, for which plot, summary, print and profile methods are defined.

Usage

etasclass(cat.orig,
           time.update=FALSE,
           magn.threshold	=2.5,
           magn.threshold.back=magn.threshold+2,	
           tmax		=max(cat.orig$time),
           long.range=range(cat.orig$long),
           lat.range=range(cat.orig$lat),
           ##### starting values for parameters
           mu		=1,
           k0		=1,
           c		=0.5,
           p		=1.01,
           gamma	=.5,
           d		=1.,
           q		=1.5,
           betacov         =0.7,
           ### indicators: if params.ind[i] i-th parameter will be estimated
           params.ind=replicate(7,TRUE),
#           params.lim=c(0,0,0,1.0,0,0,0),
           ### formula for covariates (magnitude should always be included):
           formula1            ="time~magnitude-1",
           offset              =0,
           hdef=c(1,1),
           w		=replicate(nrow(cat.orig),1),
           hvarx  =replicate(nrow(cat.orig),1),
           hvary  =replicate(nrow(cat.orig),1),
           ### flags for the kind of declustering and smoothing:
           declustering	   =TRUE,
           thinning	       =FALSE,
           flp		         =TRUE,
           m1		           =NULL,
           ndeclust        =5,
           n.iterweight    =1,
           onlytime	=FALSE,
           is.backconstant	=FALSE,
           ##### end of  main input arguments. 
           ##### Control and secondary arguments:
           description	="",
           cat.back   	=NULL,
           back.smooth	=1.0,
           sectoday	=FALSE,
           longlat.to.km   =TRUE,
#           fastML=FALSE, #### not yet implemented
#           fast.eps=0.001, #### not yet implemented
           usenlm		=TRUE,
           method		="BFGS",
           compsqm 	=TRUE,
           epsmax		= 0.0001,
           iterlim		=50,      
           ntheta		=36)

Value

returns an object of class etasclass.

The main items of the output are:

this.call: reports the exact call of the function
params.ind: indicates which parameters have been estimated (see details)
params: ML estimates of the ETAS parameters.
sqm: Estimates of standard errors of the ML estimates of the ETAS parameters (sqm[i]=0 if params.ind[i]=FALSE or where the hessian is not computed or near to singularity).
AIC.iter: AIC values at each iteration.
hdef: final bandwidth used for the kernel estimation of background spatial intensity (however estimated, with flp=TRUE or flp=FALSE).
rho.weights: Estimated probability for each event to be a background event ($\rho$).
time.res: rescaled time residuals (for time processes only).
params.iter: A matrix with estimates values at each iteration.
sqm.iter: A matrix with the estimates of the standard errors at each iteration.
rho.weights.iter: A matrix with the values of rho.weights at each iteration.
l: A vector with estimated intensities, corresponding to observed points

summary, print and plot methods are defined for an object of class etasclass to obtain main output.

A profile method (profile.etasclass

) is also defined to make approximate inference on a single parameter

Arguments

cat.orig: An earthquake catalog, possibly an object of class eqcat, or however a data.frame with variables of names time, lat, long, z, magn1. No missing values are allowed.
time.update: Logical. It is TRUE if the excution is called by time.update to update to new time maximum. Default value = FALSE.
magn.threshold: Threshold magnitude (only events with a magnitude at least magn.threshold will be used). Default value = 2.5.
magn.threshold.back: Threshold magnitude used to build the catalog cat.back for the first estimation of the background seismicity. Default value = magn.threshold+2.
tmax: Maximum value of time. Only observations before tmax will be used for estimation. Default value = max(cat.orig$time).
long.range: Longitude range. Only observations with long in the range long.range will be used for estimation. Default value = range(cat.orig$long).
lat.range: Latitude range. Only observations with lat in the range lat.range will be used for estimation. Default value = range(cat.orig$lat).

Values for the 7 parameters of the ETAS model (starting values or fixed values according to params.ind):

mu: Parameter 1 ($\mu$) of the ETAS model: background general intensity; see details. Default value = 1.
k0: Parameter 2 ($\kappa_0$) of the ETAS model: measures the strength of the aftershock activity; see details. Default value = 1.
c: Parameter 3 of the ETAS model; a shift parameter of the Omori law for temporal decay rate of aftershocks; see details. Default value = 0.5.
p: Parameter 4 of the ETAS model; the exponent of the Omori law for temporal decay rate of aftershocks; see details. Default value = 1.01.
gamma: Parameter 5 ($\gamma$) of the ETAS model; together with a is related to the efficiency of an event of given magnitude in generating aftershocks; see details. Default value = 0.5.
d: Parameter 6 of the ETAS model; parameter related to the spatial influence of the mainshock; see details. Default value = 1.
q: Parameter 7 of the ETAS model; parameter related to the spatial influence of the mainshock; see details. Default value = 1.5.
betacov: Numerical array. Parameters of the covariates ETAS model (the parameters $\beta_j$); see details. Default value = 0.7. Parameters in betacov are not limited

End of model pararameter input

params.ind: vector of 7 logical values: params.ind[i] = TRUE means that the i-th parameter must be estimated. params.ind[i] = FALSE means that the i-th parameter is fixed to its input value (the order of parametrs is: mu, k0, c, p, gamma, d, q). Default value = replicate(7,TRUE), that is, etasclass estimates all parameters.
params.lim: vector of 7 numerical values: params.lim[i] = theta0 means that the i-th parameter must be greater than theta0 (the default limits of parametrs are: 0 for mu, k0, c, 1 for p, 0 for gamma, d, q). Default value = replicate(7,TRUE), that is, etasclass estimates all parameters.
formula1: a character variable: Formula which defines the covariates acting on the induced seismicity. In classical etas model the covariate is the magnitude. The left side (dummy) element must be the time, which is a variable certainly present in the data set. The right part of the formula determines ncov the number of covariates. Default value="time~magnitude-1"; input must be a character value: it is converted in a formula inside the program
offset: An offset, for which no parameter will be estimated. Default value=0

Flags for the kind of declustering and smoothing:

hdef

Starting values for the x,y bandwidths used in the kernel estimator of background seismicity. Default value = 1,1.

w

Starting values for the weigths used in the kernel estimator of background seismicity. The length must be equal to the number of events of the catalog after event selection (can be less than nrow(cat.orig)).

Default value = replicate(nrow(cat.orig),1).

hvarx

Longitude bandwidths adjustement used in the kernel estimator of background seismicity. The length must be equal to the number of events of the catalog after event selection (can be less than nrow(cat.orig)). Default value = NULL

hvary

declustering

if TRUE the catalog is iteratively declustered to optimally estimate the background intensity (through thinning, if thinning=TRUE, or through weighting if thinning=FALSE). Default value = TRUE.

thinning

if thinning=TRUE a background catalog is obtained sampling from the original catalog with probabilities estimated during the iterations. Default value =FALSE.

flp

if flp=TRUE then background seismicity is estimated through Forward Likelihood Predictive (see details). Otherwise the Silverman rule is used. Default value =TRUE.

m1

Used only if flp=TRUE. Indicates the range of points used for the FLP steps. See details. If missing it is set to nrow(cat)/2.

ndeclust

maximum number of iterations for the general declustering procedure. Default=5.

n.iterweight

New in version 2.2. The weighting and the density computations will be alternated n.iterweight times after each maximum likelihood step: in many situations this improves the general convergence procedure. Default=1.

onlytime

if TRUE then a time process is fitted to data , regardless to space location (in this case is.backconstant is set to TRUE and declustering, flp are set to FALSE). Default value = FALSE.

is.backconstant

if TRUE then background seismicity is assumed to be homogeneous in space (and declustering, flp are set to FALSE). Default value = FALSE.

Other control parameters:

description: a description string used for the output. Default value = "".
cat.back: external catalog used for the estimation of the background seismicity. Default value = NULL.
back.smooth: Controls the level of smoothing for the background seismicity (meaningful only if flp=FALSE). Default value = 1.
sectoday: if TRUE, then time variable of cat.orig is converted from seconds to days. Default value = FALSE.
longlat.to.km: if TRUE, then long and lat variables of cat.orig are treated as geographical coordinates and converted to kilometers. Default value = TRUE.
usenlm: if TRUE, then nlm function (gauss-newton method) is used in the maximum likelihood steps; if FALSE, then optim function is used (with method =method ). Default value = TRUE.
method: used if usenlm=FALSE: method used by optim. Default value = "BFGS".
compsqm: if TRUE, then standard errors are computed. Default value = TRUE.
epsmax: maximum allowed difference between estimates in subsequent iterations (default = 0.0001).
iterlim: maximum number of iterations in the maximum likelihood steps (used in nlm or optim). Default value = 100.
ntheta: number of subdivisions of the round angle, used in the approximation of the integral involved in the likelihood computation of the ETAS model. Default value = 100.

Author

Marcello Chiodi, Giada Adelfio

Details

Estimates the components of an ETAS (Epidemic type aftershock sequence) model for the description of the seismicity of a space-time region. Background seismicity is estimated nonparametrically, while triggered seismicity is estimated by MLE.

From version 2.0 of package etasFLP covariates are allowed to improve the fitting of the triggered part, through the input formula1, which as a default values of "time ~ magnitude - 1", which corresponds to the previous version of package etasFLP, that is, magnitude as the only covariate which influence the average number of aftershocks.

The bandwidth of the kernel density estimator is estimated through the Forward Likelihood Predictive approach (FLP), (theoretical reference on Adelfio and Chiodi, 2013) if flp is set to TRUE. Otherwise the bandwidth is estimated trough the Silverman's rule. FLP steps for the estimation of nonparametric background component is alternated with the Maximum Likelihood step for the estimation of parametric components (only if declustering=TRUE). For each event the probability of being a background event or a triggered one is estimated, according to a declustering procedure in a way similar to the proposal of Zhuang, Ogata, and Vere-Jones (2002).

The ETAS model for conditional space time intensity $\lambda(x,y,t)$ is given by:

$$\lambda(x,y,t)=\mu f(x,y)+\kappa_0 \sum_{t_j<t}\frac{ e^{\eta_j}}{(t-t_j +c)^p} \left\{ \frac{(x-x_j)^2+(y-y_j)^2}{e^{\gamma \ (m_j-m_0)}}+d \right\}^{-q}$$

where $\eta_j=\sum_{j=1,ncov}\beta_j cov_{ij}$

parameters $\beta_j$ are the elements of the array variable betacov

$f(x,y)$ is estimated through a weighted kernel gaussian estimator; if flp is set to TRUE then the bandwidth is estimated through a FLP step.

Weights (computed only if declustering=TRUE) are given by the estimated probabilities of being a background event; for the i-th event this is given by $\rho_i=\frac{\mu f(x_i,y_i)}{\lambda(x_i,y_i,t_i)}$. The weights $\rho_i$ are updated after a whole iteration.

mu ($\mu$) measures the background general intensity (which is assumed temporally homogeneous);

k0 ($\kappa_0$) is a scale parameter related to the importance of the induced seismicity;

c and p are the characteristic parameters of the seismic activity of the given region; c is a shift parameter while p, which characterizes the pattern of seismicity, is the exponent parameter of the modified Omori law for temporal decay rate of aftershocks;

$\eta_j=\sum_{j=1,ncov}\beta_j cov_{ij}$ measures the efficiency of an event of a given magnitude in generating aftershock sequences;

d and q are two parameters related to the spatial influence of the mainshocks.

Many kinds of ETAS models can be estimated, managing some control input arguments. The eight ETAS parameters can be fixed to some input value, or can be estimated, according to params.ind: if params.ind[i]=FALSE the i-th parameter is kept fixed to its input value, otherwise, if params.ind[i] = TRUE, the i-th parameter is estimated and the input value is used as a starting value.

By default params.ind=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), and so a full 7+ncov parameters ETAS model will be estimated.

The seven parameters are internally ordered in this way: params = (mu, k0, c, p, gamma, d, q); for example a model with a fixed value p=1 (and params.ind[4] = FALSE) can be estimated and compared with the model where p is estimated (params.ind[4]=TRUE);

for example a 6+ncov parameters model can be fitted with gamma=0 and params.ind[5]=FALSE, so that input must be in this case: params.ind=c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE);

if onlytime=TRUE a time process is fitted to data (with a maximum of 5 parameters), regardless to space location (however the input catalog cat.orig must contain three columns named long, lat, z);

if is.backconstant=TRUE a process (space-time or time) with a constant background intensity $\mu$ is fitted;

if mu is fixed to a very low value a process with very low background intensity is fitted, that is with only clustered intensity (useful to fit a model to a single cluster of events).

If flp=TRUE the bandwidth for the kernel estimation of the background intensity is evaluated maximizing the Forward Likelihood Predictive (FLP) quantity, given by (Chiodi, Adelfio, 2011; Adelfio, Chiodi, 2013):

$$FLP_{k_1,k_2}(\hat{\boldsymbol{\psi}})\equiv\sum_{k=k_1}^{n-1}\delta_{k,{k+1}}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}})$$

with $k_1=\frac{n}{2},k_2=n-1$ and where $\delta_{k,k+1}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}})$ is the predictive information of the first $k$ observations on the $k+1$-th observation, and is so defined:

$$\delta_{k,k+1}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}})\equiv \log L(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}} )-\log L(\hat{\boldsymbol{\psi}}(H_{t_k});H_{t_k})$$

where $H_k$ is the history of the process until time $t_k$ and $\hat{\boldsymbol{\psi}}(H_{t_k})$ is an estimate based only on history until the $k-th$ observation.

In the ML step, the vector of parameter $\theta=(\mu, \kappa_0, c , p, \alpha, \gamma, d, q)$ is estimated maximizing the sample log-likelihood given by:

$$\log L(\boldsymbol{\theta}; H_{t_n}) = \sum_{i=1}^{n} \log \lambda(x_i,y_i,t_i; \boldsymbol{\theta})- \int_{T_0}^{T_{max}} \int \int_{\Omega_{(x,y)}}\, \lambda(x,y,t;\boldsymbol{\theta})\,d x \, d y \,d t $$

References

Adelfio, G. and Chiodi, M. (2013) Mixed estimation technique in semi-parametric space-time point processes for earthquake description. Proceedings of the 28th International Workshop on Statistical Modelling 8-13 July, 2013, Palermo (Muggeo V.M.R., Capursi V., Boscaino G., Lovison G., editors). Vol. 1 pp.65-70.

Adelfio G, Chiodi M (2015). Alternated Estimation in Semi-Parametric Space-Time Branching-Type Point Processes with Application to Seismic Catalogs. Stochastic Environmental Research and Risk Assessment, 29(2), 443-450. doi:10.1007/s00477-014-0873-8.

Adelfio G, Chiodi M (2015). FLP Estimation of Semi-Parametric Models for Space-Time Point Processes and Diagnostic Tools. Spatial Statistics, 14(B), 119-132. doi:10.1016/j.spasta.2015.06.004.

Adelfio G., Chiodi, M. (2020). Including covariates in a space-time point process with application to seismicity. Statistical Methods and Applications, doi:10.1007/s10260-020-00543-5.

Chiodi, M. and Adelfio, G., (2011) Forward Likelihood-based predictive approach for space-time processes. Environmetrics, vol. 22 (6), pp. 749-757. DOI:10.1002/env.1121.

Chiodi, M. and Adelfio, G., (2017) Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes' Description. Journal of Statistical Software, vol. 76 (3), pp. 1-29. DOI: 10.18637/jss.v076.i03.

Zhuang, J., Ogata, Y. and Vere-Jones, D. Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97, 369--379 (2002). DOI:10.1198/016214502760046925.