"ppm"(Q, trend=~1, interaction=Poisson(), ..., covariates=data, data=NULL, covfunargs = list(), subset, correction="border", rbord=reach(interaction), use.gam=FALSE, method="mpl", forcefit=FALSE, emend=project, project=FALSE, prior.mean = NULL, prior.var = NULL, nd = NULL, eps = NULL, gcontrol=list(), nsim=100, nrmh=1e5, start=NULL, control=list(nrep=nrmh), verb=TRUE, callstring=NULL)
"ppm"(Q, trend=~1, interaction=Poisson(), ..., covariates=data, data=NULL, covfunargs = list(), subset, correction="border", rbord=reach(interaction), use.gam=FALSE, method="mpl", forcefit=FALSE, emend=project, project=FALSE, prior.mean = NULL, prior.var = NULL, nd = NULL, eps = NULL, gcontrol=list(), nsim=100, nrmh=1e5, start=NULL, control=list(nrep=nrmh), verb=TRUE, callstring=NULL)
"ppp"
)
to which the model will be fitted,
or a quadrature scheme (of class "quad"
)
containing this pattern.
~1
, indicates the model is stationary
and no trend is to be fitted.
"interact"
describing the point process interaction
structure, or NULL
indicating that a Poisson process (stationary
or nonstationary) should be fitted.
x
and y
or the names of entries in data
)
defining a subset of the spatial domain,
to which the model-fitting should be restricted.
"border"
indicating the border correction.
Other possibilities may include "Ripley"
, "isotropic"
,
"periodic"
, "translate"
and "none"
, depending on the
interaction
.
correction = "border"
this argument specifies the distance by which
the window should be eroded for the border correction.
TRUE
then computations are performed
using gam
instead of glm
.
"mpl"
for the method of Maximum PseudoLikelihood,
"logi"
for the Logistic Likelihood method,
"VBlogi"
for the Variational Bayes Logistic Likelihood method,
and "ho"
for the Huang-Ogata approximate maximum likelihood
method.
forcefit=FALSE
, some trivial models will be
fitted by a shortcut. If forcefit=TRUE
,
the generic fitting method will always be used.
project
is an older name for
emend
.)
Logical value. Setting emend=TRUE
will ensure that the
fitted model is always a valid point process by
applying emend.ppm
.
method="VBlogi"
). See Details.
method="VBlogi"
). See Details.
nd * nd
or nd[1] * nd[2]
)
used to evaluate the integral in the pseudolikelihood.
Incompatible with eps
.
nd
.
glm.control
(or passed to gam.control
if use.gam=TRUE
)
controlling the model-fitting algorithm.
method="ho"
)
method="ho"
)
rmh
controlling the behaviour
of the Metropolis-Hastings algorithm (for method="ho"
)
method="ho"
)
"ppm"
describing a fitted point process
model.See ppm.object
for details of the format of this object
and methods available for manipulating it.
ppm
has parameters that determine the strength and
range of interaction or dependence between points.
These parameters are of two types:
ppm
command is only designed to estimate regular
parameters of the interaction.
It requires the values of any irregular parameters of the interaction
to be fixed. For example, to fit a Strauss process model to the cells
dataset, you could type ppm(cells, ~1, Strauss(r=0.07))
.
Note that the value of the irregular parameter r
must be given.
The result of this command will be a fitted model in which the
regular parameter $gamma$ has been estimated. To determine the irregular parameters, there are several
practical techniques, but no general statistical theory available.
One useful technique is maximum profile pseudolikelihood, which
is implemented in the command profilepl
.ppm()$theta[2]
less than or equal to 0
. By default (if emend=FALSE
) the algorithm
maximises the pseudolikelihood
without constraining the parameters, and does not apply any checks for
sanity after fitting the model.
This is because the fitted parameter value
could be useful information for data analysis.
To constrain the parameters to ensure that the model is a valid
point process, set emend=TRUE
. See also the functions
valid.ppm
and emend.ppm
. The trend
formula should not use any variable names
beginning with the prefixes .mpl
or Interaction
as these names are reserved
for internal use. The data frame covariates
should have as many rows
as there are points in Q
. It should not contain
variables called x
, y
or marks
as these names are reserved for the Cartesian coordinates
and the marks. If the model formula involves one of the functions
poly()
, bs()
or ns()
(e.g. applied to spatial coordinates x
and y
),
the fitted coefficients can be misleading.
The resulting fit is not to the raw spatial variates
(x
, x^2
, x*y
, etc.)
but to a transformation of these variates. The transformation is implemented
by poly()
in order to achieve better numerical stability.
However the
resulting coefficients are appropriate for use with the transformed
variates, not with the raw variates.
This affects the interpretation of the constant
term in the fitted model, logbeta
.
Conventionally, $beta$ is the background intensity, i.e. the
value taken by the conditional intensity function when all predictors
(including spatial or ``trend'' predictors) are set equal to $0$.
However the coefficient actually produced is the value that the
log conditional intensity takes when all the predictors,
including the transformed
spatial predictors, are set equal to 0
, which is not the same thing. Worse still, the result of predict.ppm
can be
completely wrong if the trend formula contains one of the
functions poly()
, bs()
or ns()
. This is a weakness of the underlying
function predict.glm
. If you wish to fit a polynomial trend,
we offer an alternative to poly()
,
namely polynom()
, which avoids the
difficulty induced by transformations. It is completely analogous
to poly
except that it does not orthonormalise.
The resulting coefficient estimates then have
their natural interpretation and can be predicted correctly.
Numerical stability may be compromised. Values of the maximised pseudolikelihood are not comparable
if they have been obtained with different values of rbord
.ppm
, described in many older documents.
This old syntax is still supported. However, if you are learning about
ppm
for the first time, we recommend you use the
new syntax described in the help file for ppm
.
This function fits a point process model
to an observed point pattern.
The model may include
spatial trend, interpoint interaction, and dependence on covariates.
Q
is a point pattern dataset
(an object of class "ppp"
) to which we wish to fit a model. The syntax of ppm()
is closely analogous to the R functions
glm
and gam
.
The analogy is:
glm |
ppm |
formula |
trend |
trend
and interaction
which are respectively analogous to
the formula
and family
arguments of glm().
Systematic effects (spatial trend and/or dependence on
spatial covariates) are specified by the argument
trend
. This is an R formula object, which may be expressed
in terms of the Cartesian coordinates x
, y
,
the marks marks
,
or the variables in covariates
(if supplied), or both.
It specifies the logarithm of the first order potential
of the process.
The formula should not use any names beginning with .mpl
as these are reserved for internal use.
If trend
is absent or equal to the default, ~1
, then
the model to be fitted is stationary (or at least, its first order
potential is constant).
The symbol .
in the trend expression stands for
all the covariates supplied in the argument data
.
For example the formula ~ .
indicates an additive
model with a main effect for each covariate in data
.
Stochastic interactions between random points of the point process
are defined by the argument interaction
. This is an object of
class "interact"
which is initialised in a very similar way to the
usage of family objects in glm
and gam
.
The models currently available are:
\GibbsInteractionsList.
See the examples below.
It is also possible to combine several interactions
using Hybrid
.
If interaction
is missing or NULL
,
then the model to be fitted
has no interpoint interactions, that is, it is a Poisson process
(stationary or nonstationary according to trend
). In this case
the methods of maximum pseudolikelihood and maximum logistic likelihood
coincide with maximum likelihood. The fitted point process model returned by this function can be printed
(by the print method print.ppm
)
to inspect the fitted parameter values.
If a nonparametric spatial trend was fitted, this can be extracted using
the predict method predict.ppm
.
covariates
.
Note that it is not sufficient to have observed
the covariate only at the points of the data point pattern;
the covariate must also have been observed at other
locations in the window. Typically the argument covariates
is a list,
with names corresponding to variables in the trend
formula.
Each entry in the list is either
"im"
, see im.object
.
(x,y)
to obtain the value of the spatial
covariate. It should be a function(x, y)
or function(x, y, ...)
in the R language.
The first two arguments of the function should be the
Cartesian coordinates $x$ and $y$. The function may have
additional arguments; if the function does not have default
values for these additional arguments, then the user must
supply values for them, in covfunargs
.
See the Examples.
TRUE
inside the window and FALSE
outside
it. This should be an object of class "owin"
.
"tess"
.
The software will look up the values of each covariate at the required locations (quadrature points).
Note that, for covariate functions, only the name of the function appears in the trend formula. A covariate function is treated as if it were a single variable. The function arguments do not appear in the trend formula. See the Examples.
If covariates
is a list,
the list entries should have names corresponding to
the names of covariates in the model formula trend
.
The variable names x
, y
and marks
are reserved for the Cartesian
coordinates and the mark values,
and these should not be used for variables in covariates
.
If covariates
is a data frame, Q
must be a
quadrature scheme (see under Quadrature Schemes below).
Then covariates
must have
as many rows as there are points in Q
.
The $i$th row of covariates
should contain the values of
spatial variables which have been observed
at the $i$th point of Q
.
Q
may be a `quadrature scheme'.
This was originally just a technicality but it has turned out
to have practical uses, as we explain below.Quadrature schemes are required for our implementation of the method of maximum pseudolikelihood. The definition of the pseudolikelihood involves an integral over the spatial window containing the data. In practice this integral must be approximated by a finite sum over a set of quadrature points. We use the technique of Baddeley and Turner (2000), a generalisation of the Berman-Turner (1992) device. In this technique the quadrature points for the numerical approximation include all the data points (points of the observed point pattern) as well as additional `dummy' points.
Quadrature schemes are also required for
the method of maximum logistic likelihood, which
combines the data points with additional `dummy' points.
A quadrature scheme is an object of class "quad"
(see quad.object
)
which specifies both the data point pattern and the dummy points
for the quadrature scheme, as well as the quadrature weights
associated with these points.
If Q
is simply a point pattern
(of class "ppp"
, see ppp.object
)
then it is interpreted as specifying the
data points only; a set of dummy points specified
by default.dummy()
is added,
and the default weighting rule is
invoked to compute the quadrature weights.
Finer quadrature schemes (i.e. those with more dummy
points) generally yield a better approximation, at the
expense of higher computational load.
An easy way to fit models using a finer quadrature scheme
is to let Q
be the original point pattern data,
and use the argument nd
to determine the number of dummy points in the quadrature scheme.
Complete control over the quadrature scheme is possible.
See quadscheme
for an overview.
Use quadscheme(X, D, method="dirichlet")
to compute
quadrature weights based on the Dirichlet tessellation,
or quadscheme(X, D, method="grid")
to compute
quadrature weights by counting points in grid squares,
where X
and D
are the patterns of data points
and dummy points respectively.
Alternatively use pixelquad
to make a quadrature
scheme with a dummy point at every pixel in a pixel image.
A practical advantage of quadrature schemes arises when we want to fit
a model involving covariates (e.g. soil pH). Suppose we have only been
able to observe the covariates at a small number of locations.
Suppose cov.dat
is a data frame containing the values of
the covariates at the data points (i.e.\ cov.dat[i,]
contains the observations for the i
th data point)
and cov.dum
is another data frame (with the same columns as
cov.dat
) containing the covariate values at another
set of points whose locations are given by the point pattern Y
.
Then setting Q = quadscheme(X,Y)
combines the data points
and dummy points into a quadrature scheme, and
covariates = rbind(cov.dat, cov.dum)
combines the covariate
data frames. We can then fit the model by calling
ppm(Q, ..., covariates)
.
method='mpl'
specify a moderately large number of dummy points,
striking a compromise between speed and accuracy.
method='logi'
specify a relatively small number of dummy points,
so that this method is the fastest, in practice.
prior.mean
and variance-covariance
matrix prior.var
.
Note that method='logi'
, method='VBlogi'
and
method='ho'
involve randomisation, so that the results are
subject to random variation.
method="ho"
then the model will be fitted using
the Huang-Ogata (1999) approximate maximum likelihood method.
First the model is fitted by maximum pseudolikelihood as
described above, yielding an initial estimate of the parameter
vector $theta0$.
From this initial model, nsim
simulated
realisations are generated. The score and Fisher information of
the model at $theta=theta0$
are estimated from the simulated realisations. Then one step
of the Fisher scoring algorithm is taken, yielding an updated
estimate $theta1$. The corresponding model is
returned. Simulated realisations are generated using rmh
.
The iterative behaviour of the Metropolis-Hastings algorithm
is controlled by the arguments start
and control
which are passed to rmh
.
As a shortcut, the argument
nrmh
determines the number of Metropolis-Hastings
iterations run to produce one simulated realisation (if
control
is absent). Also
if start
is absent or equal to NULL
, it defaults to
list(n.start=N)
where N
is the number of points
in the data point pattern.
correction
is the name of an edge correction
method.
The default correction="border"
specifies the border correction,
in which the quadrature window (the domain of integration of the
pseudolikelihood) is obtained by trimming off a margin of width
rbord
from the observation window of the data pattern.
Not all edge corrections are implemented (or implementable)
for arbitrary windows.
Other options depend on the argument interaction
, but these
generally include correction="periodic"
(the periodic or toroidal edge
correction in which opposite edges of a rectangular window are
identified) and correction="translate"
(the translation correction,
see Baddeley 1998 and Baddeley and Turner 2000).
For pairwise interaction models
there is also Ripley's isotropic correction,
identified by correction="isotropic"
or "Ripley"
.
Baddeley, A. and Turner, R. Practical maximum pseudolikelihood for spatial point patterns. Australian and New Zealand Journal of Statistics 42 (2000) 283--322. Berman, M. and Turner, T.R. Approximating point process likelihoods with GLIM. Applied Statistics 41 (1992) 31--38. Besag, J. Statistical analysis of non-lattice data. The Statistician 24 (1975) 179-195. Diggle, P.J., Fiksel, T., Grabarnik, P., Ogata, Y., Stoyan, D. and Tanemura, M. On parameter estimation for pairwise interaction processes. International Statistical Review 62 (1994) 99-117.
Huang, F. and Ogata, Y. Improvements of the maximum pseudo-likelihood estimators in various spatial statistical models. Journal of Computational and Graphical Statistics 8 (1999) 510-530. Jensen, J.L. and Moeller, M. Pseudolikelihood for exponential family models of spatial point processes. Annals of Applied Probability 1 (1991) 445--461. Jensen, J.L. and Kuensch, H.R. On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes, Annals of the Institute of Statistical Mathematics 46 (1994) 475-486.
Rajala T. (2014) A note on Bayesian logistic regression for spatial exponential family Gibbs point processes, Preprint on ArXiv.org. http://arxiv.org/abs/1411.0539
ppm.object
for details of how to
print, plot and manipulate a fitted model. ppp
and quadscheme
for constructing data.
Interactions:
\GibbsInteractionsList.
See profilepl
for advice on
fitting nuisance parameters in the interaction,
and ippm
for irregular parameters in the trend.
See valid.ppm
and emend.ppm
for
ensuring the fitted model is a valid point process.
# fit the stationary Poisson process
# to point pattern 'nztrees'
ppm(nztrees)
ppm(nztrees ~ 1)
## Not run:
# Q <- quadscheme(nztrees)
# ppm(Q)
# # equivalent.
# ## End(Not run)
## Not run:
# ppm(nztrees, nd=128)
# ## End(Not run)
fit1 <- ppm(nztrees, ~ x)
# fit the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(a + bx)
# where x,y are the Cartesian coordinates
# and a,b are parameters to be estimated
fit1
coef(fit1)
coef(summary(fit1))
## Not run:
# ppm(nztrees, ~ polynom(x,2))
# ## End(Not run)
# fit the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(a + bx + cx^2)
## Not run:
# library(splines)
# ppm(nztrees, ~ bs(x,df=3))
# ## End(Not run)
# WARNING: do not use predict.ppm() on this result
# Fits the nonstationary Poisson process
# with intensity function lambda(x,y) = exp(B(x))
# where B is a B-spline with df = 3
## Not run:
# ppm(nztrees, ~1, Strauss(r=10), rbord=10)
# ## End(Not run)
# Fit the stationary Strauss process with interaction range r=10
# using the border method with margin rbord=10
## Not run:
# ppm(nztrees, ~ x, Strauss(13), correction="periodic")
# ## End(Not run)
# Fit the nonstationary Strauss process with interaction range r=13
# and exp(first order potential) = activity = beta(x,y) = exp(a+bx)
# using the periodic correction.
# Compare Maximum Pseudolikelihood, Huang-Ogata and VB fits:
## Not run: ppm(swedishpines, ~1, Strauss(9))
## Not run: ppm(swedishpines, ~1, Strauss(9), method="ho")
ppm(swedishpines, ~1, Strauss(9), method="VBlogi")
# COVARIATES
#
X <- rpoispp(42)
weirdfunction <- function(x,y){ 10 * x^2 + 5 * sin(10 * y) }
#
# (a) covariate values as function
ppm(X, ~ y + Z, covariates=list(Z=weirdfunction))
#
# (b) covariate values in pixel image
Zimage <- as.im(weirdfunction, unit.square())
ppm(X, ~ y + Z, covariates=list(Z=Zimage))
#
# (c) covariate values in data frame
Q <- quadscheme(X)
xQ <- x.quad(Q)
yQ <- y.quad(Q)
Zvalues <- weirdfunction(xQ,yQ)
ppm(Q, ~ y + Z, covariates=data.frame(Z=Zvalues))
# Note Q not X
# COVARIATE FUNCTION WITH EXTRA ARGUMENTS
#
f <- function(x,y,a){ y - a }
ppm(X, ~x + f, covariates=list(f=f), covfunargs=list(a=1/2))
# COVARIATE: inside/outside window
b <- owin(c(0.1, 0.6), c(0.1, 0.9))
ppm(X, ~w, covariates=list(w=b))
## MULTITYPE POINT PROCESSES ###
# fit stationary marked Poisson process
# with different intensity for each species
## Not run: ppm(lansing, ~ marks, Poisson())
# fit nonstationary marked Poisson process
# with different log-cubic trend for each species
## Not run: ppm(lansing, ~ marks * polynom(x,y,3), Poisson())
Run the code above in your browser using DataLab