Ogata (1988) introduced the epidemic type aftershock sequence
  (ETAS) model based on Gutenberg-Richter law and modified Omori law.
  In its space-time representation (Ogata, 1998), the ETAS model is a
  temporal marked point process model, and a special case of marked
  Hawkes process, with conditional intensity function
    $$
      \lambda(t, x, y | H_t) = \mu(x,y) + \sum_{t_i < t} k(m_i)g(t - t_i)f(x - x_i, y - y_i|m_i)
    $$
  where
      - \(H_t\): 
- is the observational history up to time t, but not
            including t; that is
            $$H_t=\{(t_i, x_i, y_i, m_i): t_i < t\}$$ 
      
- \(\mu(x,y)\): 
- is the background intensity. Currently it
            is assumed to take the semi-parametric form
            $$\mu(x,y)=\mu u(x,y)$$
            where \(\mu\) is an unknown constant and \(u(x,y)\)
            is an unknown function. 
      
- \(k(m)\): 
- is the expected number of events triggered
            from an event of magnitude \(m\) given by
           $$k(m) = A\exp(\alpha(m - m_0))$$ 
      
- \(g(t)\): 
- is the p.d.f of the occurrence
            times of the triggered events, taking the form
           $$g(t) =  \frac{p-1}{c}(1 + \frac{t}{c})^{-p}$$ 
      
- \(f(x,y|m)\): 
- is the p.d.f of
            the locations of the triggered events, considered to be
            either the long tail inverse power density (- mver = 1)
           $$ f(x, y|m) = \frac{q-1}{\pi \sigma(m))}
             (1 + \frac{x^2 + y^2}{\sigma(m)})^{-q} $$
           or the light tail Gaussian density (- mver = 2, only can be used if- cxxcode = TRUE)
           $$ f(x,y|m)= \frac{1}{2\pi \sigma(m)}\exp(-\frac{x^2 + y^2}{2\sigma(m)}) $$
           with
           $$ \sigma(m) = D\exp(\gamma(m - m_0)) $$
 
  
The ETAS models classify seismicity into two components, background
  seismicity \(\mu(x, y)\) and clustering seismicity
  \(\lambda(t, x, y|H_t) - \mu(x, y)\), where
  each earthquake event, whether it is a background event or generated by
  another event, produces its own offspring according to the branching rules
  controlled by \(k(m)\), \(g(m)\) and \(f(x, y|m)\).
Background seismicity rate \(u(x, y)\) and the model parameters
  $$\theta=(\mu, A, c, \alpha, p, D, q, \gamma)$$
  are estimated simultaneously using an iterative approach proposed in Zhuang et al. (2002).
  First, for an initial \(u_0(x, y)\), the parameter vector
  \(\theta\)  is estimated by maximizing the log-likelihood function
  $$l(\theta)=\sum_{i} \lambda(t_i, x_i, y_i|H_{t_i}) - \int \lambda(t, x, y|H_t) dx dy dt.$$
  Then the procedure calculates the probability of being a background
  event for each event in the catalog by
  $$ \phi_i = \frac{\mu(x_i, y_i)}{\lambda(t_i, x_i, y_i|H_{t_i})}. $$
  Using these probabilities and kernel smoothing method with Gaussian kernel
  and appropriate choice of bandwidth (determined by bwd or nnp
  and bwm arguments), the background rate \(u_0(x, y)\)
  is updated. These steps are repeated until the estimates converge
  (stabilize).
The no.itr argument specifies the maximum number of iterations
  in the iterative simultaneous estimation and declustering algorithm.
  The estimates often converge in less than ten iterations. The relative
  iteration convergence tolerance and the optimization
  convergence tolerance are, respectively, determined by
  rel.tol and eps arguments.
  The progress of the computations can be traced by setting
  the verbose and plot.it arguments to be TRUE.
If cxxcode = TRUE, then the internal function etasfit
  uses the C++ code implemented using the Rcpp package,
  which allows multi-thread parallel computing on multi-core processors
  with OpenMP.
  The argument nthreads in this case determines
  the number of threads in the parallel region of the code.
  If nthreads = 1 (the default case), then a serial version of the
  C++ code carries out the computations.
This version of the ETAS model assumes that the earthquake catalog
  is complete and the data are stationary in time. If the catalog
  is incomplete or there is non-stationarity (e.g. increasing or cyclic
  trend)  in the time of events, then the results of this function are
  not reliable.