soilhypfit-package: The soilhypfit Package

Description

This is a summary of the features and functionality of soilhypfit, a package in R for parameteric modelling of soil water retention and hydraulic conductivity data.

Arguments

Author

Andreas Papritz papritz@retired.ethz.ch.

Details

Estimation approach

The main function, fit_wrc_hcc, estimates parameters of models for soil water retention and hydraulic conductivity by maximum likelihood (ml, default), maximum posterior density (mpd) estimation (Stewart et al., 1992) or nonlinear weighted least squares (wls) from data on volumetric soil water content, $\boldsymbol{\theta}^\mathrm{T} =(\theta_1, \theta_2, ..., \theta_{n_\theta})$, and/or soil hydraulic conductivity, $\boldsymbol{K}^\mathrm{T} =(K_1, K_2, ..., K_{n_K})$, both measured at given capillary pressure head, $\boldsymbol{h}^\mathrm{T} =(h_1, h_2, ...)$.

For mpd and ml estimation, the models for the measurements are $$ \theta_i = \theta(h_i;\boldsymbol{\mu}, \boldsymbol{\nu}) + \varepsilon_{\theta,i}, \ \ i=1, 2, ..., n_\theta, $$ $$ \log(K_j) = \log(K(h_j;\boldsymbol{\mu}, \boldsymbol{\nu})) + \varepsilon_{K,j}, \ \ j=1, 2, ..., n_K, $$ where $ \theta(h_i; \boldsymbol{\mu}, \boldsymbol{\nu}) $ and $ K(h_j; \boldsymbol{\mu}, \boldsymbol{\nu}) $ denote modelled water content and conductivity, $\boldsymbol{\mu}^\mathrm{} $ and $\boldsymbol{\nu}^\mathrm{} $ are the conditionally linear and nonlinear model parameters (see below and Bates and Watts, 1988, sec. 3.3.5), and $\varepsilon_{\theta,i}$ and $\varepsilon_{K,j}$ are independent, normally distributed errors with zero means and variances $\sigma^2_\theta$ and $\sigma^2_K$, respectively.

Let $$ Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h}) =\left( \boldsymbol{\theta} - \boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) \right)^\mathrm{T} \boldsymbol{W}_\theta \left( \boldsymbol{\theta} - \boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) \right), $$ and $$ Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h}) = \left( \log(\boldsymbol{K}) - \log(\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} )) \right)^\mathrm{T} \boldsymbol{W}_K \left( \log(\boldsymbol{K}) - \log(\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} )) \right), $$ denote the (possibly weighted) residual sums of squares between measurements and modelled values. $\boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) ) $ and $\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) $ are vectors with modelled values of water content and hydraulic conductivity, and $\boldsymbol{W}_\theta$ and $\boldsymbol{W}_K$ are optional diagonal matrices with weights $w_{\theta,i}$ and $w_{K,j}$, respectively. The weights are products of case weights $w^\prime_{\theta,i}$ and $w^\prime_{K,j}$ and variable weights $w_\theta$, $w_K$, hence $ w_{\theta,i} = w_\theta\, w^\prime_{\theta,i} $ and $ w_{K,j} = w_K\, w^\prime_{K,j} $.

The objective function for ml and mpd estimation is equal to (Stewart et al., 1992, eqs 14 and 15, respectively) $$ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) = \frac{\kappa_\theta}{2} \log( Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h})) + \frac{\kappa_K}{2} \log( Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h})), $$ where $\kappa_v = n_v + 2$ for mpd and $\kappa_v = n_v$ for ml estimation, $v \in (\theta, K)$, and weights $w_{\theta,i} = w_{K,j} = 1$. Note that $ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) $ is equal to the negative logarithm of the concentrated loglikelihood or the concentrated posterior density, obtained by replacing $\sigma^2_\theta$ and $\sigma^2_K$ by their conditional maximum likelihood or maximum density estimates $ \widehat{\sigma}^2_\theta(\boldsymbol{\mu}), \boldsymbol{\nu}) $ and $ \widehat{\sigma}^2_K(\boldsymbol{\mu}) $ respectively (Stewart et al., 1992, p. 642).

For wls the objective function is equal to $$ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) = Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h}) + Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h}). $$ If either water content or conductivity data are not available, then the respective terms are omitted from $Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) $.

The function fit_wrc_hcc does not attempt to estimate the parameters by minimising
$ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) $ directly with respect to $\boldsymbol{\mu}^\mathrm{} $ and $\boldsymbol{\nu}^\mathrm{} $. Rather, it exploits the fact that for given nonlinear parameters $\boldsymbol{\nu}^\mathrm{} $, the conditionally linear parameters $ \boldsymbol{\mu}^T = (\theta_r, \theta_s, \log(K_0)) $ can be estimated straightforwardly by minimising the conditional residual sums of squares

$$ Q_\theta^*(\theta_r, \theta_s; \boldsymbol{\theta}, \boldsymbol{h}, \boldsymbol{\nu}) =\left( \boldsymbol{\theta} - [ \boldsymbol{1}, \boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )] \, \left[ \begin{array}{c} \theta_r \\ \theta_s - \theta_r \end{array} \right] \right)^\mathrm{T} \boldsymbol{W}_\theta \left( \boldsymbol{\theta} - [ \boldsymbol{1}, \boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )] \, \left[ \begin{array}{c} \theta_r \\ \theta_s - \theta_r \end{array} \right] \right) $$ with respect to $\theta_r$ and $\theta_s-\theta_r$ and/or $$ Q_K^*(K_0; \boldsymbol{K}, \boldsymbol{h}, \boldsymbol{\nu})= \left( \log(\boldsymbol{K}) - \log(K_0 \, \boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )) \right)^\mathrm{T} \boldsymbol{W}_K \left( \log(\boldsymbol{K}) - \log(K_0 \, \boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )) \right), $$ with respect to $\log(K_0)$, where $\boldsymbol{1}^\mathrm{} $ is a vector of ones, $\boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )^\mathrm{T} = ( S(h_1; \boldsymbol{\nu}), ..., S(h_{n_\theta}; \boldsymbol{\nu}) ) $ and $\boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )^\mathrm{T} = ( k(h_1; \boldsymbol{\nu}), ..., k(h_{n_K}; \boldsymbol{\nu}) ) $ are vectors of modelled water saturation and modelled relative conductivity values, $\theta_r$ and $\theta_s$ are the residual and saturated water content, and $K_0$ is the saturated hydraulic conductivity.

Unconstrained conditional estimates, say $ \widehat{\theta}_r(\boldsymbol{\nu}) $, $ \widehat{\theta}_s(\boldsymbol{\nu}) - \widehat{\theta}_r(\boldsymbol{\nu}) $ and $ \widehat{\log(K_0)}(\boldsymbol{\nu}) $ can be easily obtained from the normal equations of the respective (weighted) least squares problems, and quadratic programming yields conditional (weighted) least squares estimates that honour the inequality constraints $0 \le \theta_r \le \theta_s \le 1$.

Let $ \widehat{\boldsymbol{\mu}}(\boldsymbol{\nu})^\mathrm{T} = ( \widehat{\theta}_r(\boldsymbol{\nu}), \widehat{\theta}_s(\boldsymbol{\nu}), \widehat{\log(K_0)}(\boldsymbol{\nu}) ) $ be the conditional estimates of the linear parameters obtained by minimising $ Q_\theta^*(\theta_r, \theta_s; \boldsymbol{\theta}, \boldsymbol{h}, \boldsymbol{\nu}) $, and $ Q_K^*(K_0; \boldsymbol{K}, \boldsymbol{h}, \boldsymbol{\nu}) $, respectively. fit_wrc_hcc then estimates the nonlinear parameters by minimising $ Q( \widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) $ with respect to $\boldsymbol{\nu}^\mathrm{} $ by a nonlinear optimisation algorithm.

For mpd and ml estimation the variances of the model errors are estimated by (Stewart et al., 1992, eq. 16) $$ \widehat{\sigma}_\theta^2 = \frac{Q_{\theta}( \widehat{\boldsymbol{\mu}}(\widehat{\boldsymbol{\nu}}), \widehat{\boldsymbol{\nu}}; \boldsymbol{\theta}, \boldsymbol{h})}{\kappa_\theta}, $$ and $$ \widehat{\sigma}_K^2 = \frac{Q_{K}( \widehat{\boldsymbol{\mu}}(\widehat{\boldsymbol{\nu}}), \widehat{\boldsymbol{\nu}}; \boldsymbol{K}, \boldsymbol{h})}{\kappa_K}. $$

Furthermore, for mpd and ml estimation, the covariance matrix of the estimated nonlinear parameters may be approximated by the inverse Hessian matrix of $ Q(\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) $ at the solution $ \widehat{\boldsymbol{\nu}} $ (Stewart and Sørensen, 1981), i.e.

$$ \mathrm{Cov}[ \widehat{\boldsymbol{\nu}}, \widehat{\boldsymbol{\nu}}^T] \approx \boldsymbol{A}^{-1}, $$ where $$ [\boldsymbol{A}]_{kl} = \frac{\partial^2}{\partial\nu_k\, \partial\nu_l} \left. Q(\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h})\right|_{\nu=\hat{\nu}}. $$

Details on parameter estimation

Models for water retention curves and hydraulic conductivity functions

Currently, fit_wrc_hcc allows to estimate the parameters of the simplified form of the Van Genuchten-Mualem (VGM) model (Van Genuchten, 1980) with the restriction $m = 1 - \frac{1}{n}$, see wc_model and hc_model. This model has the following parameters:

$\boldsymbol{\mu}^\mathrm{T} = (\theta_r, \theta_s, K_0)$ (see above) and
$\boldsymbol{\nu}^\mathrm{T} = (\alpha, n, \tau)$ where $\alpha$ is the inverse air entry pressure, $n$ the shape and $\tau$ the tortuosity parameter.

Any of these parameters can be either estimated from data or kept fixed at the specified initial values (see arguments param and fit_param of fit_wrc_hcc).

Imposing physical constraints on the estimated parameters

Parameters of models for the water retention curve and the hydraulic conductivity function may vary only within certain bounds (see wc_model, hc_model and param_boundf for allowed ranges). fit_wrc_hcc either estimates transformed parameters that vary over the whole real line and can therefore be estimated without constraints (see param_transf), or it uses algorithms (quadratic programming for estimating $\boldsymbol{\mu}^\mathrm{} $, nonlinear optimisation algorithms with box constraints for estimating $\boldsymbol{\nu}^\mathrm{} $) that restrict estimates to permissible ranges, see Details section of control_fit_wrc_hcc.

In addition, for natural soils, the parameters of the VGM model cannot vary independently from each other over the allowed ranges. Sets of fitted parameters should always result in soil hydraulic quantities that are physically meaningful. One of these quantities is the characteristic length $L_\mathrm{c}$ of stage-I evaporation from a soil (Lehmann et al., 2008). $L_\mathrm{c}$ can be related to the parameters of the VGM model, see Lehmann et al. (2008, 2020) and evaporative-length.

Using several soil hydrological databases, Lehmann et al. (2020) analysed the mutual dependence of VGM parameters and proposed regression equations to relate the inverse air entry pressure $\alpha$ and the saturated hydraulic $K_0$ to the shape parameter $n$, which characterises the width of the pore size distribution of a soil. Using these relations, Lehmann et al. (2020) then computed the expected value (``target'') $L_\mathrm{t}$ of $L_\mathrm{c}$ for given $n$ and tortuosity parameter $\tau$, see evaporative-length. fit_wrc_hcc allows to constrain estimates of the nonlinear parameters $\boldsymbol{\nu}^\mathrm{} $ by defining permissible lower and upper bounds for the ratio $L_\mathrm{c}/L_\mathrm{t}$, see arguments ratio_lc_lt_bound of fit_wrc_hcc and settings of control_fit_wrc_hcc.

Choice of optimisation algorithm for estimating the nonlinear parameters

To estimate $\boldsymbol{\nu}^\mathrm{} $, fit_wrc_hcc minimises $ Q( \widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) $ either by a nonlinear optimisation algorithm available in the library NLopt (Johnson, see nloptr) or by the Shuffled Complex Evolution (SCE) optimisation algorithm (Duan et al., 1994, see SCEoptim). The choice of the algorithm is controlled by the argument settings of the function control_fit_wrc_hcc:

global optimisation without constraints for the ratio $L_\mathrm{c}/L_\mathrm{t}$
(settings = "uglobal" or settings = "sce"),
global optimisation with inequality constraints for the ratio $L_\mathrm{c}/L_\mathrm{t}$
(settings = "cglobal"),
local optimisation without constraints for the ratio $L_\mathrm{c}/L_\mathrm{t}$
(settings = "ulocal"),
local optimisation with inequality constraints for the ratio $L_\mathrm{c}/L_\mathrm{t}$
(settings = "clocal").

The settings argument also sets reasonable default values for the termination (= convergence) criteria for the various algorithms, see NLopt documentation, section Termination conditions. The NLopt documentation contains a very useful discussion of (constrained) optimisation problems in general, global vs. local optimisation and gradient-based vs. derivative-free algorithms.

Note that besides the settings argument of control_fit_wrc_hcc, the arguments nloptr and sce along with the functions control_nloptr and control_sce allow to fully control the nonlinear optimisation algorithms, see control_fit_wrc_hcc for details.

Computing initial values of parameters

For local optimisation algorithms ``good'' initial values of $\boldsymbol{\nu}^\mathrm{} $ are indispensable for successful estimation. fit_wrc_hcc allows to compute initial values of $\alpha$ and $n$ for the Van Genuchten model from water retention data by the following procedure:

Smooth the water retention data, $ (\theta_i, y_i= \log(h_i)), i=1,2, ... n_\theta, $ by an additive model.
Determine the saturation, $S^*$, and the logarithm of capillary pressure head, $y^* = \log(h^*)$, at the inflection point of the additive model fit.
Find the root, say $\widehat{m}$, of $S^* = (1 + 1/m)^{-m}$. One obtains the right-hand side of this equation by solving $\frac{\partial^2}{\partial y^2} \left[ S_\mathrm{VG}(\exp(y); \boldsymbol{\nu}) \right] = 0 $ for $y$ and plugging the result into the expression for $ S_\mathrm{VG}(\exp(y); \boldsymbol{\nu}), $ see wc_model.
Compute $ \widehat{n} = 1 / (1 - \widehat{m}) $ and $ \widehat{\alpha} = 1 / \exp(y^*) \,(1/\widehat{m})^{1-\widehat{m}}. $ The second expression is again a result of solving $\frac{\partial^2}{\partial y^2} \left[ S_\mathrm{VG}(\exp(y); \boldsymbol{\nu}) \right] = 0. $

Initial values for local optimisation algorithms can of course also be obtained by first estimating the parameters by a global algorithm. These estimates can be ``refined'' in a second step by a local unconstrained algorithm, possibly followed by a third call of fit_wrc_hcc to constrain the estimated parameters by the ratio $L_\mathrm{c}/L_\mathrm{t}$. The method coef.fit_wrc_hcc can be used to extract the estimated parameters from an object of class fit_wrc_hcc and to pass them as initial values to fit_wrc_hcc, see fit_wrc_hcc for examples.

References

Bates, D. M., and Watts, D. G. (1988) Nonlinear Regression Analysis and its Applications. John Wiley & Sons, New York tools:::Rd_expr_doi("10.1002/9780470316757").

Duan, Q., Sorooshian, S., and Gupta, V. K. (1994) Optimal use of the SCE-UA global optimisation method for calibrating watershed models, Journal of Hydrology 158, 265--284, tools:::Rd_expr_doi("10.1016/0022-1694(94)90057-4").

Johnson, S.G. The NLopt nonlinear-optimisation package. https://github.com/stevengj/nlopt.

Lehmann, P., Assouline, S., Or, D. (2008) Characteristic lengths affecting evaporative drying of porous media. Physical Review E, 77, 056309, tools:::Rd_expr_doi("10.1103/PhysRevE.77.056309").

Lehmann, P., Bickel, S., Wei, Z., Or, D. (2020) Physical Constraints for Improved Soil Hydraulic Parameter Estimation by Pedotransfer Functions. Water Resources Research 56, e2019WR025963, tools:::Rd_expr_doi("10.1029/2019WR025963").

Stewart, W.E., Caracotsios, M. Sørensen, J.P. 1992. Parameter estimation from multiresponse data. AIChE Journal, 38, 641--650, tools:::Rd_expr_doi("10.1002/aic.690380502").

Stewart, W.E. and Sørensen, J.P. (1981) Bayesian estimation of common parameters from multiresponse data with missing observations. Technometrics, 23, 131--141,
tools:::Rd_expr_doi("10.1080/00401706.1981.10486255").

Van Genuchten, M. Th. (1980) A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Science Society of America Journal, 44, 892--898, tools:::Rd_expr_doi("10.2136/sssaj1980.03615995004400050002x").