whittaker2: Implementation notes for Whittaker smoothing and interpolation of surfaces

Description

Smooth [1] or estimate the baseline [2] of surface measured on an arbitrary grid by minimising a sum of penalties. Combined difference orders and two different methods of preventing negative values in the output are supported.

This is not a public interface. Subject to change without further notice. Please do not call from outside albatross.

Usage

whittaker2(x, y, z, lambda, d, p, logscale, nonneg)
  diffmat(x, y, d)
  vandermonde(x0)

Value

whittaker2: A matrix of the same shape as z, with values smoothed, interpolated, or baseline estimated.
diffmat: A difference matrix $\mathbf D$ that, when multiplied by $\mathrm{vec}(\mathbf{Z})$, returns a vector of estimated derivatives of $\mathbf Z$ of given order by $x$ and $y$ in every available point (see below).
vandermonde: A vector $\mathbf c$ of length $n =$ length(x0) such that $ \mathbf{c}^\top f(\mathbf{x_0}) \approx f^{(n-1)}(x) $. The $x$ is either the median point of x0 or the average of two points in the middle, depending on whether $n$ is odd or even.

Arguments

x

Grid values along the rows of z.

y

Grid values along the columns of z.

z

Matrix containing the surface values to smooth or NAs to interpolate.

lambda

A vector of smoothness penalties, one for every difference order. Must be of the same length as d.

d

whittaker2: A vector of difference orders corresponding to elements of lambda.

diffmat

Difference order, an integer scalar.

If not missing, use the asymmetric penalty method [2] to estimate the baseline by penalising the differences with weight $p$ if $\hat{z} < z$ and $1 - p$ otherwise. Typically, values around $10^{-3}$ are used.

logscale

If not NA, rescale z between logscale and $1$ and take its logarithm before smoothing, then undo it. This is supposed to prevent negative numbers from appearing in the interpolation, but may damage the surface shape in the process.

nonneg

If not 0, for every resulting negative value in the interpolated surface, add a penalty of nonneg pulling it to $0$ and repeat the process until no new penalties are introduced.

A vector specifying the grid where the function to be differentiated is measured. Must be sorted.

Details

Whittaker smoothing

Whittaker smoothing works by minimising a sum of penalties [1]. Taking weights into account, we get:

$$ \min_{\mathbf{\hat z}} \: (\mathbf{\hat z} - \mathbf{z})^\top \mathrm{diag}(\mathbf w) (\mathbf{\hat z} - \mathbf{z}) + \lambda | \mathbf D \mathbf{\hat z} |^2 $$

By writing down the derivatives over $\mathbf{\hat z}$ and equating them to $0$, we get the normal equation:

$$ ( \mathrm{diag}(\mathbf w) + \lambda \mathbf{D}^\top \mathbf{D} ) \mathbf{\hat z} = \mathrm{diag}(\mathbf w) \mathbf z $$

The equation is then solved using solve.

Equations above deal with vectors, not matrices. We can't just smooth the matrix row-by-row (or column-by-column): that wouldn't be any better than feemscatter(method = 'pchip'). One way to generalise this to matrices would be [3], but it only works for uniform grids. Instead, we unfold the z matrix into a vector, then strategically position the coefficients in the difference matrix to obtain both row-wise and column-wise differences in the resulting vector.

Finite difference approximation

How to differentiate a function tabulated on a fixed, potentially nonuniform grid before you even know its values? Use its Taylor series.

First derivative is special because it's possible to use central differences and get a second-order accurate result, even on a non-uniform grid, by carefully choosing the points where the derivative is calculated. Let $x + \frac{h}{2}$ and $x - \frac{h}{2}$ be a pair of adjacent points from the grid. Here's the Taylor series expansion for $f$ around $x$, with the Lagrange form of the reminder:

$$ f \left(x + \frac{h}{2}\right) = f(x) + \frac{h}{2} f'(x) + \frac{h^2}{8} f''(x) + \frac{h^3}{48} f'''(\zeta) $$

$$ f \left(x - \frac{h}{2}\right) = f(x) - \frac{h}{2} f'(x) + \frac{h^2}{8} f''(x) - \frac{h^3}{48} f'''(\eta) $$

$$ f \left(x + \frac{h}{2}\right) - f \left(x - \frac{h}{2}\right) = h f'(x) + \frac{h^3}{48} \left(f'''(\zeta) + f'''(\eta)\right) $$

$$ f'(x) = \frac{ f\left(x + \frac{h}{2}\right) - f\left(x - \frac{h}{2}\right) }{h} - \frac{h^2}{48} \left(f'''(\zeta) + f'''(\eta)\right) $$

$$ |\delta f'(x)| \le \max_{ \xi \in [x - \frac{h}{2}, x + \frac{h}{2}] } \frac{h^2}{24} f'''(\xi) $$

Suppose the three grid points $\xi_1 = x_1 - \frac{h_1}{2}$, $ \xi_2 = x_1 + \frac{h_1}{2} = x_2 - \frac{h_2}{2} $, $\xi_3 = x_2 + \frac{h_2}{2}$ are adjacent on the grid, and we know the $f'$ estimates in $x_1$ and $x_2$:

On the one hand, Taylor series expansion for $f'(x)$ around $\frac{x_1 + x_2}{2}$ gives:

$$ f'(x_2) = f'\left(\frac{x_1 + x_2}{2}\right) + f''\left(\frac{x_1 + x_2}{2}\right)\frac{x_2 - x_1}{2} + f'''\left(\frac{x_1 + x_2}{2}\right)\frac{(x_2 - x_1)^2}{8} + f''''(\zeta)\frac{(x_2 - x_1)^3}{48} $$ $$ f'(x_1) = f'\left(\frac{x_1 + x_2}{2}\right) - f''\left(\frac{x_1 + x_2}{2}\right)\frac{x_2 - x_1}{2} + f'''\left(\frac{x_1 + x_2}{2}\right)\frac{(x_2 - x_1)^2}{8} - f''''(\eta)\frac{(x_2 - x_1)^3}{48} $$ $$ f''\left(\frac{x_1 + x_2}{2}\right) = \frac{f'(x_2) - f'(x_1)}{x_2 - x_1} - \frac{(x_2 - x_1)^2}{48}(f''''(\zeta) + f''''(\eta)) $$ $$ |\delta f''(x)| \le \max_{ \xi \in [x_1, x_2] } \frac{(x_2 - x_1)^2}{24} f''''(\xi) $$

On the other hand, if we substitute the estimations of $f'(x)$ from above, we get:

$$ f''\left(\frac{x_1 + x_2}{2}\right) = \frac{ h_1 f(\xi_3) - h_1 f(\xi_2) - h_2 f(\xi_2) + h_2 f(\xi_1) }{h_1 h_2 (x_2 - x_1)} - \frac{ h_2^2 f'''(\zeta_2) - h_1^2 f'''(\zeta_1) }{24(x_2 - x_1)} - \frac{(x_2 - x_1)^2}{24} f''''(\eta) $$

This is why we can't just keep using central differences and get $n+1$th order accurate results.

What are the general methods of finding the coefficients for the $\mathbf D$ matrix? Start with a system of Taylor series expansions for every grid point:

$$ f(x_i) = \sum_{k=0}^{n-1} f^{(k)}(x) \frac{(x_i - x)^k}{k!} + f^{(n)}(\xi) \frac{(x_i - x)^{n}}{n!} \; \forall i = 1 \dots n $$

We can solve this system for coefficients $c_i$ giving the desired $l$-th derivative estimate with highest accuracy $p$ possible [4]:

$$ \sum_{i = 1}^n c_i f(x_i) = f^{(l)}(x) + o(h^p) $$

Substituting the approximations for $f(x_i)$ into the equation, we get the following condition for the multiplier in front of each $f^{(k)}(x)$:

$$ \frac{1}{k!} \sum_{i = 1}^n c_i (x_i - x)^k = \mathbf{1}_{k = l} \; \forall k = 0 \dots n-1 $$

In the matrix form, this becomes a Vandermonde system:

$$ V_{k,i} = \frac{(x_i - x)^k}{k!} $$ $$ b_k = \mathbf{1}_{k = l} $$ $$ \mathbf c = \mathbf{V}^{-1} \mathbf b $$

Unfortunately, this system becomes ill-conditioned for “large” numbers of points. (Experiment shows noticeably growing $c_i$ even for third derivative from $10$ points and no solution for $32$ points on a uniform grid.) Fornberg [5] suggests a more numerically stable procedure, but it still breaks down for $1000$ points.

It is important to note that the performance of the method depends on the matrix $\mathbf D$ being sparse. While the methods described above could give more accurate results, they do so at the cost of providing nonzero weights for a lot of points, and the weights get larger as the number of points increases. Therefore, with the knowledge that difference orders above $3$ are used very rarely and the interest in simplicity and performance, we'll minimise the number of coefficients and their values by solving the Vandermonde system for the minimally accurate derivative estimations, taking exactly $k + 1$ points for $k$-th derivative.

What is the error of such estimation? Substituting the Lagrange form of the remainder into $ \mathbf{c}^\top f(\mathbf x) $, we get:

$$ \sum_{i = 1}^n c_i f(x_i) = f^{(n-1)}(x) + \sum_{i = 1}^n c_i f^{(n)}(\xi_i) \frac{(x_i - x)^n}{n!}, \; \xi_i \in [ x_i, x ] $$

Our choice of $x$ (middle point for odd $n$, average of middle points for even $n$) lets us shave off one term from the sum above for odd $n$ and get second order accurate results for $n = 2$, but other than that, the method is $n$-th order accurate.

References

tools::toRd(bibentry('Article', title = 'A Perfect Smoother', volume = 75, doi = '10.1021/ac034173t', number = 14, journal = 'Analytical Chemistry', author = person(c('Paul', 'H.', 'C.'), 'Eilers'), year = 2003, pages = '3631-3636', ))
tools::toRd(bibentry('Article', title = 'Parametric Time Warping', volume = 76, doi = '10.1021/ac034800e', number = 2, journal = 'Analytical Chemistry', author = person(c('Paul', 'H.', 'C.'), 'Eilers'), year = 2004, pages = '404-411', ))
tools::toRd(bibentry('Article', title = 'Robust smoothing of gridded data in one and higher dimensions with missing values', volume = 54, doi = '10.1016/j.csda.2009.09.020', number = 4, journal = 'Computational Statistics & Data Analysis', author = person('Damien', 'Garcia'), year = 2010, pages = '1167-1178' ))
tools::toRd(bibentry('InBook', author = person(c('Randall', 'J.'), 'LeVeque'), booktitle = 'Finite Difference Methods for Ordinary and Partial Differential Equations', chapter = 1, pages = '3-11', publisher = 'Society for Industrial and Applied Mathematics (SIAM)', title = 'Finite Difference Approximations', url = 'https://faculty.washington.edu/rjl/fdmbook/', year = 2007 ))
tools::toRd(bibentry('Article', title = 'Generation of finite difference formulas on arbitrarily spaced grids', volume = 51, doi = '10.1090/S0025-5718-1988-0935077-0', number = 184, journal = 'Mathematics of Computation', author = person('Bengt', 'Fornberg'), year = 1988, pages = '699-706' ))

Examples

Run this code

  data(feems)
  z <- feemscatter(feems$a, rep(25, 4), 'omit')
  str(albatross:::whittaker2(
    attr(z, 'emission'), attr(z, 'excitation'), z,
    c(1, 1e-3), 1:2, logscale = NA, nonneg = 1
  ))

Run the code above in your browser using DataLab