pointsPerLag: Points and point pairs per lag distance class

Description

Functions to counts the number of points or point pairs per lag distance class. Functions to compute the deviation of the observed distribution of counts from a pre-specified distribution. Functions to compute the minimum number of points or point pairs observed over all lag distance classes.

Usage

pointsPerLag(points, lags, lags.type = "equidistant", lags.base = 2,
               cutoff = NULL)
pairsPerLag(points, lags, lags.type = "equidistant", lags.base = 2, 
              cutoff = NULL)
objPoints(points, lags, lags.type = "equidistant", lags.base = 2, cutoff = NULL,
          criterion = "minimum", pre.distri)
objPairs(points, lags, lags.type = "equidistant", lags.base = 2, cutoff = NULL,
         criterion = "minimum", pre.distri)

Arguments

points

Data frame or matrix containing the projected coordinates of a set of points.

lags

Integer value defining the number of lag distance classes. Alternatively, a vector of numeric values defining the lower and upper limits of each lag distance class.

lags.type

Character value defining the type of lag distance classes. Available options are "equidistant", for equidistant lag distance classes, and "exponential", for exponentially spaced lag distance classes. Defaults to lags.type =

lags.base

Numeric value defining the creation of exponentially spaced lag distance classes. Defaults to lags.base = 2. See Details for more information.

cutoff

Numeric value defining the maximum distance value up to which lag distance classes are created. Used only when lag distance classes are not defined.

criterion

Character value defining the measure that should be returned to describe the energy state of the current system configuration. Available options are "minimum" and "distribution". The first returns the minimum number of points or

pre.distri

A vector of numeric values used to pre-specify the distribution of points or point pairs with which the observed counts of points or point pairs per lag distance class is compared. Used only when criterion = "distribution". Defaults to a unif

Value

pairsPerLag and pointsPerLag return a data.frame with three columns: a) the lower and b) upper limits of each lag distance class, and c) the number of points or point pairs per lag distance class.
objPairs and objPoints return a numeric value depending on the choice of criterion. If criterion = "distribution", the sum of the differences between the pre-specified and observed distribution of counts of points or point pairs per lag distance class. If criterion = "minimum", the inverse of the minimum count of points or point pairs over all lag distance classes multiplied by a constant (i.e. 10000).

Details

Distances{ Euclidean distances between points are calculated using the function dist. This computation requires the coordinates to be projected. The user is responsible for making sure that this requirement is attained. } Distribution{ Using the default uniform distribution of point pairs within objPairs means that the number of point pairs per lag distance class is equal to $n \times (n - 1) / (2 \times lag)$, where $n$ is the total number of points in points, and $lag$ is the number of lag distance classes.

Using the default uniform distribution of points within objPoints means that the number of points per lag distance class is equal to the total number of points in points. This is the same as expecting that each point contributes to every lag distance class.

Distributions other that the default options can be easily implemented changing the arguments lags, lags.type, lags.base and pre.distri. } Type of lags{ Two types of lag distance classes can be created by default. The first (lags.type = "equidistant") are evenly spaced lags. They are created by simply dividing the distance interval from zero to cutoff by the required number of lags.

The second type (lags.type = "exponential") of lag distance classes is defined by exponential spacings. The spacings are defined by the base $b$ of the exponential expression $b^n$, where $n$ is the required number of lags. The base is defined using argument lags.base. For example, the default lags.base = 2 creates lags that are sequentially defined as half of the immediately preceding larger lag. If cutoff = 100 and lags = 4, the upper limits of the lag distance classes will be

> 100 / (2 ^ c(1:4)) [1] 50.00 25.00 12.50 6.25 } Criteria{

The functions objPairs and objPoints were designed to be used in spatial simulated annealing to optimize spatial sample configurations. Both of them have two criteria implemented. The first is called using criterion = "distribution" and is used to minimize the sum of differences between a pre-specified distribution and the observed distribution of points or point pairs per lag distance class.

Consider that we aim at having the following distribution of points per lag distance class:

desired <- c(10, 10, 10, 10, 10),

and that the observed distribution of points per lag distance class is the following:

observed <- c(1, 2, 5, 10, 10).

The objective at each iteration of the optimization will be to match the two distributions. This criterion is of the same type as the one proposed by Warrick and Myers (1987).

The second criterion is called using criterion = "minimum". It corresponds to maximizing the minimum number of points or point pairs observed over all lag distance classes. Consider we observe the following distribution of points per lag distance classes in the first iteration:

observed <- c(1, 2, 5, 10, 10).

The objective in the next iteration will be to increase the number of points in the first lag distance class ($n = 1$). Consider we then have the following resulting distribution:

resulting <- c(5, 2, 5, 10, 10).

Now the objective will be to increse the number of points in the second lag distance class ($n = 2$). The optimization continues until it is not possible to increase the number of points in any of the lag distance classes, that is, when:

distribution <- c(10, 10, 10, 10, 10).

This shows that the result of using criterion = "minimum" is similar to using criterion = "distribution". However, the resulting sample pattern can be significantly different. The running time of the optimization algorithm can be a bit longer when using criterion = "distribution", but since it is a more sensitive criteria, convergence can be attained with a smaller number of iterations. However, this also depends on the other parameters passed to the optimization algorithm.

It is important to note that using the first criterion ("distribution") in simulated annealing corresponds to a minimization problem. On the other hand, using the second criterion ("minimum") would correspond to a maximization problem. We solve this inconsistency substituting the criterion that has to be maximized by its inverse. For conveninence we multiply the resulting value by a constant (i.e. $10000 / x + 1$, where x is the criterion value). This procedure allows us to define both problems as minimization problems. } Utopia and nadir points{ Knowledge of the utopia and nadir points can help in the construction of multi-objective optimization problems.

When criterion = "distribution", the utopia ($f^{\circ}_{i}$) point is exactly zero ($f^{\circ}_{i} = 0$). When criterion = "minimum", the utopia point tends to zero ($f^{\circ}_{i} \rightarrow 0$). It can be calculated using the equation $10000 / n + 1$, where n is the number of points (objPoints), or the number point pairs divided by the number of lag distance classes (objPairs).

The nadir ($f^{max}_{i}$) point depends on a series of elements. For instance, when criterion = "distribution", if the desired distribution of point or point pairs per lag distance class is pre.distribution <- c(10, 10, 10, 10, 10), the worst case scenario would be to have all points or point pairs in a single lag distance class, that is, obs.distribution <- c(0, 50, 0, 0, 0). In this case, the nadir point is equal to the sum of the differences between the two distributions:

sum((c(10, 10, 10, 10, 10) - c(0, 50, 0, 0, 0)) ^ 2) = 2000.

When objective = "minimum", the nadir point is equal to $f^{max}_{i} = 10000 / 0 + 1 = 10000$. }

References

Bresler, E.; Green, R. E. Soil parameters and sampling scheme for characterizing soil hydraulic properties of a watershed. Honolulu: University of Hawaii at Manoa, p. 42, 1982.

Marler, R. T.; Arora, J. S. Function-transformation methods for multi-objective optimization. Engineering Optimization. v. 37, p. 551-570, 2005.

Russo, D. Design of an optimal sampling network for estimating the variogram. Soil Science Society of America Journal. v. 48, p. 708-716, 1984.

Truong, P. N.; Heuvelink, G. B. M.; Gosling, J. P. Web-based tool for expert elicitation of the variogram. Computers and Geosciences. v. 51, p. 390-399, 2013.

Warrick, A. W.; Myers, D. E. Optimization of sampling locations for variogram calculations. Water Resources Research. v. 23, p. 496-500, 1987.

Examples

Run this code

require(sp)
data(meuse)
meuse <- meuse[, 1:2]
tmp <- pairsPerLag(meuse, lags = 6, lags.type = "exponential", cutoff = 1000)

Run the code above in your browser using DataLab