pcf.ppp: Pair Correlation Function of Point Pattern

Description

Estimates the pair correlation function of a point pattern using kernel methods.

Usage

# S3 method for ppp
pcf(X, ..., r = NULL,
                    adaptive=FALSE,
                    kernel="epanechnikov", bw=NULL, h=NULL,
                    bw.args=list(), stoyan=0.15, adjust=1, 
                    correction=c("translate", "Ripley"),
                    divisor = c("r", "d", "a", "t"),
                    zerocor=c("weighted", "reflection", "convolution",
                             "bdrykern", "JonesFoster", "none"),
                    gref = NULL,
                    tau = 0,
                    fast = TRUE, 
                    var.approx = FALSE,
                    domain=NULL,
                    ratio=FALSE, close=NULL)

Value

A function value table (object of class "fv"). Essentially a data frame containing the variables

r: the vector of values of the argument $r$ at which the pair correlation function $g(r)$ has been estimated
theo: vector of values equal to 1, the theoretical value of $g(r)$ for the Poisson process
trans: vector of values of $g(r)$ estimated by translation correction
iso: vector of values of $g(r)$ estimated by Ripley isotropic correction
v: vector of approximate values of the variance of the estimate of $g(r)$

as required.

If ratio=TRUE then the return value also has two attributes called "numerator" and "denominator"

which are "fv" objects containing the numerators and denominators of each estimate of $g(r)$.

The return value also has an attribute "bw" giving the smoothing bandwidth that was used, and an attribute "info"

containing details of the algorithm parameters.

Arguments

X: A point pattern (object of class "ppp").
...: Arguments passed to density.default or to densityBC controlling the kernel smoothing of the pair correlation estimate.
r: Optional. Vector of values for the argument $r$ at which $g(r)$ should be evaluated. There is a sensible default.
adaptive: Logical value specifying whether to use adaptive kernel smoothing (adaptive=TRUE) or fixed-bandwidth kernel smoothing (adaptive=FALSE, the default).
kernel: Choice of smoothing kernel, passed to density.default.
bw: Bandwidth for smoothing kernel. Either a single numeric value giving the standard deviation of the kernel, or a character string specifying a bandwidth selection rule, or a function that computes the selected bandwidth. See Details.
h: Kernel halfwidth $h$ (incompatible with argument bw). A numerical value. The parameter h is defined as the half-width of the support of the kernel, except for the Gaussian kernel where h is the standard deviation.
bw.args: Optional. List of additional arguments to be passed to bw when bw is a function. Alternatively, bw may be a function that should be applied to X to produce a list of additional arguments.
stoyan: Coefficient for Stoyan's bandwidth selection rule; see Details.
adjust: Numerical adjustment factor for the bandwidth. The bandwidth actually used is adjust * bw. This makes it easy to specify choices like ‘half the selected bandwidth’.
correction: Edge correction. A character vector specifying the choice (or choices) of edge correction. See Details.
divisor: Choice of divisor in the estimation formula: either "r" (the default) or "d", or the new alternatives "a" or "t". See Details.
zerocor: String (partially matched) specifying a correction for the boundary effect bias at $r=0$. Possible values are "none", "weighted", "convolution", "reflection", "bdrykern" and "JonesFoster". See Details, or help file for densityBC.
gref: Optional. A pair correlation function that will be used as the reference for the transformation to uniformity, when divisor="t". Either a function in the R language giving the pair correlation function, or a fitted model (object of class "kppm", "dppm", "ppm" or "slrm") or a theoretical point process model (object of class "zclustermodel" or "detpointprocfamily") for which the pair correlation function can be computed.
tau: Optional shrinkage coefficient. A single numeric value.
fast: Logical value specifying whether to compute the kernel smoothing using a Fast Fourier Transform algorithm (fast=TRUE) or an exact analytic kernel sum (fast=FALSE).
var.approx: Logical value indicating whether to compute an analytic approximation to the variance of the estimated pair correlation.
domain: Optional. Calculations will be restricted to this subset of the window. See Details.
ratio: Logical. If TRUE, the numerator and denominator of each edge-corrected estimate will also be saved, for use in analysing replicated point patterns.
close: Advanced use only. Precomputed data. See section on Advanced Use.

Advanced Use

To perform the same computation using several different bandwidths bw, it is efficient to use the argument close. This should be the result of closepairs(X, rmax) for a suitably large value of rmax, namely rmax >= max(r) + 3 * bw.

Author

Adrian Baddeley Adrian.Baddeley@curtin.edu.au, Rolf Turner rolfturner@posteo.net, Ege Rubak rubak@math.aau.dk, Martin Hazelton Martin.Hazelton@otago.ac.nz and Tilman Davies Tilman.Davies@otago.ac.nz.

Details

The pair correlation function $g(r)$ is a summary of the dependence between points in a spatial point process. The best intuitive interpretation is the following: the probability $p(r)$ of finding two points at locations $x$ and $y$ separated by a distance $r$ is equal to $$ p(r) = \lambda^2 g(r) \,{\rm d}x \, {\rm d}y $$ where $\lambda$ is the intensity of the point process. For a completely random (uniform Poisson) process, $p(r) = \lambda^2 \,{\rm d}x \, {\rm d}y$ so $g(r) = 1$. Formally, the pair correlation function of a stationary point process is defined by $$ g(r) = \frac{K'(r)}{2\pi r} $$ where $K'(r)$ is the derivative of $K(r)$, the reduced second moment function (aka ``Ripley's $K$ function'') of the point process. See Kest for information about $K(r)$.

For a stationary Poisson process, the pair correlation function is identically equal to 1. Values $g(r) < 1$ suggest inhibition between points; values greater than 1 suggest clustering.

This routine computes an estimate of $g(r)$ by kernel smoothing.

If divisor="r" (the default), then the standard kernel estimator (Stoyan and Stoyan, 1994, pages 284--285) is used. By default, the recommendations of Stoyan and Stoyan (1994) are followed exactly.
If divisor="d" then a modified estimator is used (Guan, 2007): the contribution from an interpoint distance $d_{ij}$ to the estimate of $g(r)$ is divided by $d_{ij}$ instead of dividing by $r$. This usually improves the bias of the estimator when $r$ is close to zero.
If divisor="a" then improved method of Baddeley, Davies and Hazelton (2025) is used. The distances $d_{ij}$ are first converted to disc areas $a_{ij}=\pi d_{ij}^2$, and smoothing is performed on the area scale, then the result is back-transformed to the original scale.
If divisor="t" then the distances $d_{ij}$ are transformed to uniformity using the reference pair correlation function gref as described in Baddeley, Davies and Hazelton (2025).
If divisor is a function in the R language, then it will be applied to the point pattern X and should return one of the strings "r", "d", "a" or "t" listed above. This option makes it possible to specify a rule which decides which estimator to use, based on the data.

There is also a choice of spatial edge corrections (which are needed to avoid bias due to edge effects associated with the boundary of the spatial window):

If correction="translate" or correction="translation" then the translation correction is used. For divisor="r" the translation-corrected estimate is given in equation (15.15), page 284 of Stoyan and Stoyan (1994).
If correction="Ripley" or correction="isotropic" then Ripley's isotropic edge correction is used. For divisor="r" the isotropic-corrected estimate is given in equation (15.18), page 285 of Stoyan and Stoyan (1994).
If correction="none" then no edge correction is used, that is, an uncorrected estimate is computed.

Multiple corrections can be selected. The default is correction=c("translate", "Ripley").

Alternatively correction="all" selects all options; correction="best" selects the option which has the best statistical performance; correction="good" selects the option which is the best compromise between statistical performance and speed of computation.

Argument zerocor determines the correction to the one-dimensional kernel-smoothed estimate on the real number line, to correct bias close to the boundary $r=0$. The argument zerocor is passed to densityBC. Options include:

zerocor="none": no correction.
zerocor="convolution": the convolution, uniform or renormalization kernel.
zerocor="weighted": the cut-and-normalization method.
zerocor="reflection": the reflection method.
zerocor="bdrykern": the linear boundary kernel.
zerocor="JonesFoster": the Jones-Foster modification of the linear boundary kernel.

The choice of smoothing kernel is controlled by the argument kernel which is passed to density.default. The default is the Epanechnikov kernel, recommended by Stoyan and Stoyan (1994, page 285).

The bandwidth of the smoothing kernel can be controlled by the argument bw. Bandwidth is defined as the standard deviation of the kernel; see the documentation for density.default. For the Epanechnikov kernel with half-width h, the argument bw is equivalent to $h/\sqrt{5}$.

Stoyan and Stoyan (1994, page 285) recommend using the Epanechnikov kernel with support $[-h,h]$ chosen by the rule of thumn $h = c/\sqrt{\lambda}$, where $\lambda$ is the (estimated) intensity of the point process, and $c$ is a constant in the range from 0.1 to 0.2. See equation (15.16). If bw is missing or NULL, then this rule of thumb will be applied. The argument stoyan determines the value of $c$. The smoothing bandwidth that was used in the calculation is returned as an attribute of the final result.

The argument bw can be

missing or null. In this case, the default value for bw is "stoyan" when adaptive=FALSE and "bw.abram" when adaptive=TRUE.
a single numeric value giving the bandwidth.
a character string specifying a bandwidth selection rule. String names of rules applicable when adaptive=FALSE include "stoyan", "fiksel" and any rules recognised by density.default. String names applicable when adaptive=TRUE include "bw.abram" and "bw.pow".
a function that computes the selected bandwidth.
- If adaptive=FALSE, the function bw will be applied to the point pattern X to determine the bandwidth. Examples include bw.pcf and bw.stoyan. The function bw should accept the point pattern X as its first argument. Additional arguments to bw may be specified in the list bw.args. If bw recognises any of the arguments kernel, correction, divisor, zerocor and adaptive, then these arguments will be passed to bw as well. The function bw should return a single numeric value.
- If adaptive=TRUE, the function bw will be applied to the vector of pairwise distances between data points (or the transformed distances if divisor="a" or divisor="t"). Examples include bw.abram.default and bw.pow. The function bw should accept the vector of pairwise distances as its first argument. Additional arguments to bw may be specified in the list bw.args.

Note that if bw.args is a function, it will be applied to the point pattern X to determine the list of arguments (whether adaptive is TRUE or FALSE).

The argument r is the vector of values for the distance $r$ at which $g(r)$ should be evaluated. There is a sensible default. If it is specified, r must be a vector of increasing numbers starting from r[1] = 0, and max(r) must not exceed half the diameter of the window.

If the argument domain is given, estimation will be restricted to this region. That is, the estimate of $g(r)$ will be based on pairs of points in which the first point lies inside domain and the second point is unrestricted. The argument domain should be a window (object of class "owin") or something acceptable to as.owin. It must be a subset of the window of the point pattern X.

To compute a confidence band for the true value of the pair correlation function, use lohboot.

If var.approx = TRUE, the variance of the estimate of the pair correlation will also be calculated using an analytic approximation (Illian et al, 2008, page 234) which is valid for stationary point processes which are not too clustered. This calculation is not yet implemented when the argument domain is given.

If fast=TRUE, the calculation uses the Fast Fourier Transform to the maximum extent possible for the chosen boundary correction. If fast=FALSE (the default), the entire calculation uses analytic formulas written in C code.

References

Baddeley, A., Davies, T.M. and Hazelton, M.L. (2025) An improved estimator of the pair correlation function of a spatial point process. Biometrika, to appear.

Guan, Y. (2007) A least-squares cross-validation bandwidth selection approach in pair correlation function estimation. Statistics and Probability Letters 77 (18) 1722--1729.

Illian, J., Penttinen, A., Stoyan, H. and Stoyan, D. (2008) Statistical Analysis and Modelling of Spatial Point Patterns. Wiley.

Stoyan, D. and Stoyan, H. (1994) Fractals, random shapes and point fields: methods of geometrical statistics. John Wiley and Sons.

Examples

Run this code

  pr <- pcf(redwood, divisor="r")
  plot(pr, main="pair correlation function for redwoods")

  # compare estimates
  pd <- pcf(redwood, divisor="d")
  pa <- pcf(redwood, divisor="a")

  plot(pr, cbind(iso, theo) ~ r, col=c("red", "black"),
       ylim.covers=0, main="Estimates of PCF",
       lwd=c(2,1), lty=c(1,3), legend=FALSE)
  plot(pd, iso ~ r, col="blue", lwd=2, add=TRUE)
  plot(pa, iso ~ r, col="green", lwd=2, add=TRUE)
  legend("center", col=c("red", "blue", "green"), lty=1, lwd=2,
         legend=c("divisor=r","divisor=d", "divisor=a"))

Run the code above in your browser using DataLab