diagnose.ppm: Diagnostic Plots for Fitted Point Process Model

Description

Given a point process model fitted to a point pattern, produce diagnostic plots based on residuals.

Usage

diagnose.ppm(object, ..., type="raw", which="all", sigma=NULL,  rbord=reach(object), cumulative=TRUE, plot.it=TRUE, rv = NULL, compute.sd=is.poisson(object), compute.cts=TRUE, envelope=FALSE, nsim=39, nrank=1, typename, check=TRUE, repair=TRUE, oldstyle=FALSE, splineargs=list(spar=0.5))
  "plot"(x, ..., which,  plot.neg=c("image", "discrete", "contour", "imagecontour"), plot.smooth=c("imagecontour", "image", "contour", "persp"), plot.sd, spacing=0.1, outer=3, srange=NULL, monochrome=FALSE, main=NULL)

Arguments

object

The fitted point process model (an object of class "ppm") for which diagnostics should be produced. This object is usually obtained from ppm.

type

String indicating the type of residuals or weights to be used. Current options are "eem" for the Stoyan-Grabarnik exponential energy weights, "raw" for the raw residuals, "inverse" for the inverse-lambda residuals, and "pearson" for the Pearson residuals. A partial match is adequate.

which

Character string or vector indicating the choice(s) of plots to be generated. Options are "all", "marks", "smooth", "x", "y" and "sum". Multiple choices may be given but must be matched exactly. See Details.

sigma

Bandwidth for kernel smoother in "smooth" option.

rbord

Width of border to avoid edge effects. The diagnostic calculations will be confined to those points of the data pattern which are at least rbord units away from the edge of the window. (An infinite value of rbord will be ignored.)

cumulative

Logical flag indicating whether the lurking variable plots for the $x$ and $y$ coordinates will be the plots of cumulative sums of marks (cumulative=TRUE) or the plots of marginal integrals of the smoothed residual field (cumulative=FALSE).

plot.it

Logical value indicating whether plots should be shown. If plot.it=FALSE, the computed diagnostic quantities are returned without plotting them.

plot.neg

String indicating how the density part of the residual measure should be plotted.

plot.smooth

String indicating how the smoothed residual field should be plotted.

compute.sd,plot.sd

Logical values indicating whether error bounds should be computed and added to the "x" and "y" plots. The default is TRUE for Poisson models and FALSE for non-Poisson models. See Details.

envelope,nsim,nrank

Arguments passed to lurking in order to plot simulation envelopes for the lurking variable plots.

Usually absent. Advanced use only. If this argument is present, the values of the residuals will not be calculated from the fitted model object but will instead be taken directly from rv.

spacing

The spacing between plot panels (when a four-panel plot is generated) expressed as a fraction of the width of the window of the point pattern.

outer

The distance from the outermost line of text to the nearest plot panel, expressed as a multiple of the spacing between plot panels.

srange

Vector of length 2 that will be taken as giving the range of values of the smoothed residual field, when generating an image plot of this field. This is useful if you want to generate diagnostic plots for two different fitted models using the same colour map.

monochrome

Flag indicating whether images should be displayed in greyscale (suitable for publication) or in colour (suitable for the screen). The default is to display in colour.

check

Logical value indicating whether to check the internal format of object. If there is any possibility that this object has been restored from a dump file, or has otherwise lost track of the environment where it was originally computed, set check=TRUE.

repair

Logical value indicating whether to repair the internal format of object, if it is found to be damaged.

oldstyle

Logical flag indicating whether error bounds should be plotted using the approximation given in the original paper (oldstyle=TRUE), or using the correct asymptotic formula (oldstyle=FALSE).

splineargs

Argument passed to lurking to control the smoothing in the lurking variable plot.

The value returned from a previous call to diagnose.ppm. An object of class "diagppm".

typename

String to be used as the name of the residuals.

main

Main title for the plot.

...

Extra arguments, controlling either the resolution of the smoothed image (passed from diagnose.ppm to density.ppp) or the appearance of the plots (passed from diagnose.ppm to plot.diagppm and from plot.diagppm to plot.default).

compute.cts

Advanced use only.

Value

An object of class "diagppm" which contains the coordinates needed to reproduce the selected plots. This object can be plotted using plot.diagppm and printed using print.diagppm.

Replicated Data

Note that if object is a model that was obtained by first fitting a model to replicated point pattern data using mppm and then using subfits to extract a model for one of the individual point patterns, then the variance calculations are only implemented for the innovation variance (oldstyle=TRUE) and this is the default in such cases.

Details

The function diagnose.ppm generates several diagnostic plots for a fitted point process model. The plots display the residuals from the fitted model (Baddeley et al, 2005) or alternatively the `exponential energy marks' (Stoyan and Grabarnik, 1991). These plots can be used to assess goodness-of-fit, to identify outliers in the data, and to reveal departures from the fitted model. See also the companion function qqplot.ppm.

The argument object must be a fitted point process model (object of class "ppm") typically produced by the maximum pseudolikelihood fitting algorithm ppm).

The argument type selects the type of residual or weight that will be computed. Current options are:

"eem":: exponential energy marks (Stoyan and Grabarnik, 1991) computed by eem. These are positive weights attached to the data points (i.e. the points of the point pattern dataset to which the model was fitted). If the fitted model is correct, then the sum of these weights for all data points in a spatial region $B$ has expected value equal to the area of $B$. See eem for further explanation.

"raw", "inverse" or "pearson":

point process residuals (Baddeley et al, 2005) computed by the function residuals.ppm. These are residuals attached both to the data points and to some other points in the window of observation (namely, to the dummy points of the quadrature scheme used to fit the model). If the fitted model is correct, then the sum of the residuals in a spatial region $B$ has mean zero. The options are

"raw": the raw residuals;
"inverse": the `inverse-lambda' residuals, a counterpart of the exponential energy weights;
"pearson": the Pearson residuals.

See residuals.ppm for further explanation.

The argument which selects the type of plot that is produced. Options are:

"marks":: plot the residual measure. For the exponential energy weights (type="eem") this displays circles centred at the points of the data pattern, with radii proportional to the exponential energy weights. For the residuals (type="raw", type="inverse" or type="pearson") this again displays circles centred at the points of the data pattern with radii proportional to the (positive) residuals, while the plotting of the negative residuals depends on the argument plot.neg. If plot.neg="image" then the negative part of the residual measure, which is a density, is plotted as a colour image. If plot.neg="discrete" then the discretised negative residuals (obtained by approximately integrating the negative density using the quadrature scheme of the fitted model) are plotted as squares centred at the dummy points with side lengths proportional to the (negative) residuals. [To control the size of the circles and squares, use the argument maxsize.]

"smooth":

plot a kernel-smoothed version of the residual measure. Each data or dummy point is taken to have a `mass' equal to its residual or exponential energy weight. (Note that residuals can be negative). This point mass is then replaced by a bivariate isotropic Gaussian density with standard deviation sigma. The value of the smoothed residual field at any point in the window is the sum of these weighted densities. If the fitted model is correct, this smoothed field should be flat, and its height should be close to 0 (for the residuals) or 1 (for the exponential energy weights). The field is plotted either as an image, contour plot or perspective view of a surface, according to the argument plot.smooth. The range of values of the smoothed field is printed if the option which="sum" is also selected.

"x":

produce a `lurking variable' plot for the $x$ coordinate. This is a plot of $h(x)$ against $x$ (solid lines) and of $E(h(x))$ against $x$ (dashed lines), where $h(x)$ is defined below, and $E(h(x))$ denotes the expectation of $h(x)$ assuming the fitted model is true.

if cumulative=TRUE then $h(x)$ is the cumulative sum of the weights or residuals for all points which have $X$ coordinate less than or equal to $x$. For the residuals $E(h(x)) = 0$, and for the exponential energy weights $E(h(x)) = $ area of the subset of the window to the left of the line $X=x$.
if cumulative=FALSE then $h(x)$ is the marginal integral of the smoothed residual field (see the case which="smooth" described above) on the $x$ axis. This is approximately the derivative of the plot for cumulative=TRUE. The value of $h(x)$ is computed by summing the values of the smoothed residual field over all pixels with the given $x$ coordinate. For the residuals $E(h(x)) = 0$, and for the exponential energy weights $E(h(x)) = $ length of the intersection between the observation window and the line $X=x$.

If plot.sd = TRUE, then superimposed on the lurking variable plot are the pointwise two-standard-deviation error limits for $h(x)$ calculated for the inhomogeneous Poisson process. The default is plot.sd = TRUE for Poisson models and plot.sd = FALSE for non-Poisson models.

"y":

produce a similar lurking variable plot for the $y$ coordinate.

"sum":

print the sum of the weights or residuals for all points in the window (clipped by a margin rbord if required) and the area of the same window. If the fitted model is correct the sum of the exponential energy weights should equal the area of the window, while the sum of the residuals should equal zero. Also print the range of values of the smoothed field displayed in the "smooth" case.

"all":

All four of the diagnostic plots listed above are plotted together in a two-by-two display. Top left panel is "marks" plot. Bottom right panel is "smooth" plot. Bottom left panel is "x" plot. Top right panel is "y" plot, rotated 90 degrees.

The argument rbord ensures there are no edge effects in the computation of the residuals. The diagnostic calculations will be confined to those points of the data pattern which are at least rbord units away from the edge of the window. The value of rbord should be greater than or equal to the range of interaction permitted in the model.

By default, the two-standard-deviation limits are calculated from the exact formula for the asymptotic variance of the residuals under the asymptotic normal approximation, equation (37) of Baddeley et al (2006). However, for compatibility with the original paper of Baddeley et al (2005), if oldstyle=TRUE, the two-standard-deviation limits are calculated using the innovation variance, an over-estimate of the true variance of the residuals. (However, see the section about Replicated Data).

The argument rv would normally be used only by experts. It enables the user to substitute arbitrary values for the residuals or marks, overriding the usual calculations. If rv is present, then instead of calculating the residuals from the fitted model, the algorithm takes the residuals from the object rv, and plots them in the manner appropriate to the type of residual or mark selected by type. If type ="eem" then rv should be similar to the return value of eem, namely, a numeric vector of length equal to the number of points in the original data point pattern. Otherwise, rv should be similar to the return value of residuals.ppm, that is, it should be an object of class "msr" (see msr) representing a signed measure.

The return value of diagnose.ppm is an object of class "diagppm". The plot method for this class is documented here. There is also a print method. See the Examples.

In plot.diagppm, if a four-panel diagnostic plot is produced (the default), then the extra arguments xlab, ylab, rlab determine the text labels for the $x$ and $y$ coordinates and the residuals, respectively. The undocumented arguments col.neg and col.smooth control the colour maps used in the top left and bottom right panels respectively. See also the companion functions qqplot.ppm, which produces a Q-Q plot of the residuals, and lurking, which produces lurking variable plots for any spatial covariate.

References

Baddeley, A., Turner, R., \Moller, J. and Hazelton, M. (2005) Residual analysis for spatial point processes. Journal of the Royal Statistical Society, Series B 67, 617--666.

Baddeley, A., \Moller, J. and Pakes, A.G. (2008) Properties of residuals for spatial point processes. Annals of the Institute of Statistical Mathematics 60, 627--649. Stoyan, D. and Grabarnik, P. (1991) Second-order characteristics for stochastic structures connected with Gibbs point processes. Mathematische Nachrichten, 151:95--100.

Examples

Run this code

    fit <- ppm(cells ~x, Strauss(r=0.15))
    diagnose.ppm(fit)
    ## Not run: 
#     diagnose.ppm(fit, type="pearson")
#     ## End(Not run)

    diagnose.ppm(fit, which="marks")

    diagnose.ppm(fit, type="raw", plot.neg="discrete")

    diagnose.ppm(fit, type="pearson", which="smooth")

    # save the diagnostics and plot them later
    u <- diagnose.ppm(fit, rbord=0.15, plot.it=FALSE)
    ## Not run: 
#     plot(u)
#     plot(u, which="marks")
#     ## End(Not run)

Run the code above in your browser using DataLab