# ppm

##### Fit Point Process Model to Data

Fits a point process model to an observed point pattern

- Keywords
- spatial

##### Usage

```
ppm(Q, trend=~1, interaction=NULL, covariates=NULL,
correction="border", rbord=0, use.gam=FALSE, method="mpl")
```

##### Arguments

- Q
- A data point pattern (of class
`"ppp"`

) to which the model will be fitted, or preferably a quadrature scheme (of class`"quad"`

) containing this pattern. - trend
- An Rformula object specifying the spatial trend to be fitted.
The default formula,
`~1`

, indicates the model is stationary and no trend is to be fitted. - interaction
- An object of class
`"interact"`

describing the point process interaction structure, or`NULL`

indicating that a Poisson process (stationary or nonstationary) should be fitted. - covariates
- The values of any spatial covariates (other than the Cartesian coordinates) required by the model. Either a data frame, or a list of images. See Details.
- correction
- The name of the edge correction to be used. The default
is
`"border"`

indicating the border correction. Other possibilities may include`"Ripley"`

,`"isotropic"`

,`"translate"`

and`"none"`

- rbord
- If
`correction = "border"`

this argument specifies the distance by which the window should be eroded for the border correction. - use.gam
- Logical flag; if
`TRUE`

then computations are performed using`gam`

instead of`glm`

. - method
- The method used to fit the model. Currently the only option
is
`"mpl"`

indicating the method of Maximum PseudoLikelihood.

##### Details

This function fits a point process model
to an observed point pattern.
The model may include
spatial trend, interpoint interaction, and dependence on covariates.
Model-fitting is currently performed by
the method of maximum pseudolikelihood (Besag, 1975).
Other options will be added in future versions.
The algorithm is an implementation of the method of
Baddeley and Turner (2000), which approximates the pseudolikelihood
by a special type of quadrature sum generalising the Berman-Turner (1992)
approximation.
The argument `Q`

should be either a point pattern
or a quadrature scheme. If it is a point pattern, it is converted
into a quadrature scheme.

A quadrature scheme is an object of class `"quad"`

(see `quad.object`

)
which specifies both the data point pattern and the dummy points
for the quadrature scheme, as well as the quadrature weights
associated with these points.
If `Q`

is simply a point pattern
(of class `"ppp"`

, see `ppp.object`

)
then it is interpreted as specifying the
data points only; a set of dummy points specified
by `default.dummy()`

is added, and the default weighting rule is
invoked to compute the quadrature weights.
The usage of `ppm()`

is closely analogous to the Splus/Rfunctions
`glm`

and `gam`

.
The analogy is:
**glm** **ppm**
`formula`

`trend`

`family`

`interaction`

}
The point process model to be fitted is specified by the
arguments `trend`

and `interaction`

which are respectively analogous to
the `formula`

and `family`

arguments of glm().
Systematic effects (spatial trend and/or dependence on
spatial covariates) are specified by the argument
`trend`

. This is an Splus/Rformula object, which may be expressed
in terms of the Cartesian coordinates `x`

, `y`

,
the marks `marks`

,
or the variables in `covariates`

(if supplied), or both.
It specifies the **logarithm** of the first order potential
of the process.
The formula should not use any names beginning with `.mpl`

as these are reserved for internal use.
If `trend`

is absent or equal to the default, `~1`

, then
the model to be fitted is stationary (or at least, its first order
potential is constant).
Stochastic interactions between random points of the point process
are defined by the argument `interaction`

. This is an object of
class `"interact"`

which is initialised in a very similar way to the
usage of family objects in `glm`

and `gam`

.
See the examples below.
If `interaction`

is missing or `NULL`

, then the model to be fitted
has no interpoint interactions, that is, it is a Poisson process
(stationary or nonstationary according to `trend`

). In this case
the method of maximum pseudolikelihood
coincides with maximum likelihood.

To fit a model involving spatial covariates
other than the Cartesian coordinates $x$ and $y$,
the values of the covariates should be supplied in the
argument `covariates`

.
Note that it is not sufficient to have observed
the covariate only at the points of the data point pattern;
the covariate must also have been observed at other
locations in the window.

The argument `covariates`

, if supplied, should be either
a data frame or a list of images.
If it is a data frame, it must have
as many rows as there are points in `Q`

.
The $i$th row of `covariates`

should contain the values of
spatial variables which have been observed
at the $i$th point of `Q`

. In this case
the argument `Q`

must be a quadrature scheme,
not merely a point pattern.

Alternatively, `covariates`

may be a list of images,
each image giving the values of a spatial covariate at
a fine grid of locations.
The argument `covariates`

should then be a named list,
with the names corresponding to
the names of covariates in the model formula `trend`

.
Each entry in the list must be an image object (of class `"im"`

,
see `im.object`

).
The software will look up
the pixel values of each image at the quadrature points.
In this case the argument `Q`

may be either a quadrature
scheme or a point pattern.

The variable names `x`

, `y`

and `marks`

are reserved for the Cartesian
coordinates and the mark values,
and these should not be used for variables in `covariates`

.
The argument `correction`

is the name of an edge correction method.
The default `correction="border"`

specifies the border correction,
in which the quadrature window (the domain of integration of the
pseudolikelihood) is obtained by trimming off a margin of width `rbord`

from the observation window of the data pattern.
Not all edge corrections are implemented (or implementable)
for arbitrary windows.
Other options depend on the argument `interaction`

, but these generally
include `"periodic"`

(the periodic or toroidal edge correction
in which opposite edges of a rectangular window are identified)
and `"translate"`

(the translation correction, see Baddeley 1998 and Baddeley and Turner
2000).
For pairwise interaction there is also Ripley's isotropic correction,
identified by `"isotropic"`

or `"Ripley"`

.
The fitted point process model returned by this function can be printed
(by the print method `print.ppm`

)
to inspect the fitted parameter values.
If a nonparametric spatial trend was fitted, this can be extracted using
the predict method `predict.ppm`

.

This algorithm approximates the log pseudolikelihood
by a sum over a finite set of quadrature points.
Finer quadrature schemes (i.e. those with more
quadrature points) generally yield a better approximation, at the
expense of higher computational load.
Complete control over the quadrature scheme is possible.
See `quadscheme`

for an overview.

Note that the method of maximum pseudolikelihood is believed to be inefficient and biased for point processes with strong interpoint interactions. In such cases, it is advisable to use iterative maximum likelihood methods such as Monte Carlo Maximum Likelihood (Geyer, 1999) provided the appropriate simulation algorithm exists. The maximum pseudolikelihood parameter estimate often serves as a good initial starting point for these iterative methods. Maximum pseudolikelihood may also be used profitably for model selection in the initial phases of modelling.

The warning message ``Algorithm did not converge in: glm.fit( ...''
is sometimes obtained. It means that the glm fitting algorithm
did not converge in a reasonable number of iterations. In most cases
this can be fixed by taking a finer quadrature scheme.
See `quadscheme`

.

##### Value

- An object of class
`"ppm"`

describing a fitted point process model. The fitted parameters can be obtained just by printing this object. Fitted spatial trends can be extracted using the`predict`

method for this object (see`predict.ppm`

).See

`ppm.object`

for details of the format of this object.

##### Warnings

See the comments above about the possible inefficiency
and bias of the maximum pseudolikelihood estimator.
The accuracy of the Berman-Turner-Baddeley approximation to
the pseudolikelihood depends on the number of dummy points used
in the quadrature scheme. The number of dummy points should
at least equal the number of data points.
The parameter values of the fitted model
do not necessarily determine a valid point process.
Some of the point process models are only defined when the parameter
values lie in a certain subset. For example the Strauss process only
exists when the interaction parameter $\gamma$
is less than or equal to $1$,
corresponding to a value of `ppm()$theta[2]`

less than or equal to `0`

.
The current version of `ppm`

maximises the pseudolikelihood
without constraining the parameters, and does not apply any checks for
sanity after fitting the model.
The `trend`

formula should not use any variable names
beginning with the prefixes `.mpl`

or `Interaction`

as these names are reserved
for internal use. The data frame `covariates`

should have as many rows
as there are points in `Q`

. It should not contain
variables called `x`

, `y`

or `marks`

as these names are reserved for the Cartesian coordinates
and the marks.
If the model formula involves one of the functions
`poly()`

, `bs()`

or `ns()`

(e.g. applied to spatial coordinates `x`

and `y`

),
the fitted coefficients can be misleading.
The resulting fit is not to the raw spatial variates
(`x`

, `x^2`

, `x*y`

, etc.)
but to a transformation of these variates. The transformation is implemented
by `poly()`

in order to achieve better numerical stability.
However the
resulting coefficients are appropriate for use with the transformed
variates, not with the raw variates.
This affects the interpretation of the constant
term in the fitted model, `logbeta`

.
Conventionally, $\beta$ is the background intensity, i.e. the
value taken by the conditional intensity function when all predictors
(including spatial or ``trend'' predictors) are set equal to $0$.
However the coefficient actually produced is the value that the
log conditional intensity takes when all the predictors,
including the *transformed*
spatial predictors, are set equal to `0`

, which is not the same thing.

Worse still, the result of `predict.ppm`

can be
completely wrong if the trend formula contains one of the
functions `poly()`

, `bs()`

or `ns()`

. This is a weakness of the underlying
function `predict.glm`

.

If you wish to fit a polynomial trend,
we offer an alternative to `poly()`

,
namely `polynom()`

, which avoids the
difficulty induced by transformations. It is completely analogous
to `poly`

except that it does not orthonormalise.
The resulting coefficient estimates then have
their natural interpretation and can be predicted correctly.
Numerical stability may be compromised.

Values of the maximised pseudolikelihood are not comparable
if they have been obtained with different values of `rbord`

.

##### References

Baddeley, A. and Turner, R.
Practical maximum pseudolikelihood for spatial point patterns.
*Australian and New Zealand Journal of Statistics*
**42** (2000) 283--322.
Berman, M. and Turner, T.R.
Approximating point process likelihoods with GLIM.
*Applied Statistics* **41** (1992) 31--38.
Besag, J.
Statistical analysis of non-lattice data.
*The Statistician* **24** (1975) 179-195.
Diggle, P.J., Fiksel, T., Grabarnik, P., Ogata, Y., Stoyan, D. and
Tanemura, M.
On parameter estimation for pairwise interaction processes.
*International Statistical Review* **62** (1994) 99-117.
Jensen, J.L. and Moeller, M.
Pseudolikelihood for exponential family models of spatial point processes.
*Annals of Applied Probability* **1** (1991) 445--461.
Jensen, J.L. and Kuensch, H.R.
On asymptotic normality of pseudo likelihood
estimates for pairwise interaction processes,
*Annals of the Institute of Statistical Mathematics*
**46** (1994) 475-486.

##### See Also

`ppp`

,
`quadscheme`

,
`ppm.object`

,
`Poisson`

,
`Strauss`

,
`StraussHard`

,
`MultiStrauss`

,
`MultiStraussHard`

,
`Softcore`

,
`DiggleGratton`

,
`Pairwise`

,
`PairPiece`

,
`Geyer`

,
`LennardJones`

,
`Saturated`

,
`OrdThresh`

,
`Ord`

##### Examples

```
data(nztrees)
ppm(nztrees)
# fit the stationary Poisson process
# to point pattern 'nztrees'
Q <- quadscheme(nztrees)
ppm(Q)
# equivalent.
ppm(nztrees, ~ x)
# fit the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(a + bx)
# where x,y are the Cartesian coordinates
# and a,b are parameters to be estimated
ppm(nztrees, ~ polynom(x,2))
# fit the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(a + bx + cx^2)
library(splines)
ppm(nztrees, ~ bs(x,df=3))
# WARNING: do not use predict.ppm() on this result
# Fits the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(B(x))
# where B is a B-spline with df = 3
ppm(nztrees, ~1, Strauss(r=10), rbord=10)
# Fit the stationary Strauss process with interaction range r=10
# using the border method with margin rbord=10
ppm(nztrees, ~ x, Strauss(13), correction="periodic")
# Fit the nonstationary Strauss process with interaction range r=13
# and exp(first order potential) = activity = beta(x,y) = exp(a+bx)
# using the periodic correction.
# COVARIATES
#
X <- rpoispp(42)
weirdfunction <- function(x,y){ 10 * x^2 + runif(length(x))}
Zimage <- as.im(weirdfunction, unit.square())
#
# (a) covariate values in pixel image
ppm(X, ~ y + Z, covariates=list(Z=Zimage))
#
# (b) covariate values in data frame
Q <- quadscheme(X)
xQ <- x.quad(Q)
yQ <- y.quad(Q)
Zvalues <- weirdfunction(xQ,yQ)
ppm(Q, ~ y + Z, covariates=data.frame(Z=Zvalues))
# Note Q not X
## MULTITYPE POINT PROCESSES ###
data(lansing)
# Multitype point pattern --- trees marked by species
<testonly># equivalent functionality - smaller dataset
data(betacells)</testonly>
# fit stationary marked Poisson process
# with different intensity for each species
ppm(lansing, ~ marks, Poisson())
<testonly>ppm(betacells, ~ marks, Poisson())</testonly>
# fit nonstationary marked Poisson process
# with different log-cubic trend for each species
ppm(lansing, ~ marks * polynom(x,y,3), Poisson())
<testonly>ppm(betacells, ~ marks * polynom(x,y,2), Poisson())</testonly>
```

*Documentation reproduced from package spatstat, version 1.6-2, License: GPL version 2 or newer*