This is a summary of the features and functionality of soilhypfit, a package in R for parameteric modelling of soil water retention and hydraulic conductivity data.
Andreas Papritz papritz@retired.ethz.ch.
The main function, fit_wrc_hcc, estimates parameters of
models for soil water retention and hydraulic conductivity by
maximum likelihood (ml, default), maximum posterior
density (mpd) estimation (Stewart et al., 1992) or
nonlinear weighted least squares (wls) from data on volumetric
soil water content, \(\boldsymbol{\theta}^\mathrm{T} =(\theta_1, \theta_2,
..., \theta_{n_\theta})\), and/or soil hydraulic conductivity,
\(\boldsymbol{K}^\mathrm{T} =(K_1, K_2, ..., K_{n_K})\), both measured at given
capillary pressure head, \(\boldsymbol{h}^\mathrm{T} =(h_1, h_2, ...)\).
For mpd and ml estimation, the models for the measurements are $$ \theta_i = \theta(h_i;\boldsymbol{\mu}, \boldsymbol{\nu}) + \varepsilon_{\theta,i}, \ \ i=1, 2, ..., n_\theta, $$ $$ \log(K_j) = \log(K(h_j;\boldsymbol{\mu}, \boldsymbol{\nu})) + \varepsilon_{K,j}, \ \ j=1, 2, ..., n_K, $$ where \( \theta(h_i; \boldsymbol{\mu}, \boldsymbol{\nu}) \) and \( K(h_j; \boldsymbol{\mu}, \boldsymbol{\nu}) \) denote modelled water content and conductivity, \(\boldsymbol{\mu}^\mathrm{} \) and \(\boldsymbol{\nu}^\mathrm{} \) are the conditionally linear and nonlinear model parameters (see below and Bates and Watts, 1988, sec. 3.3.5), and \(\varepsilon_{\theta,i}\) and \(\varepsilon_{K,j}\) are independent, normally distributed errors with zero means and variances \(\sigma^2_\theta\) and \(\sigma^2_K\), respectively.
Let $$ Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h}) =\left( \boldsymbol{\theta} - \boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) \right)^\mathrm{T} \boldsymbol{W}_\theta \left( \boldsymbol{\theta} - \boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) \right), $$ and $$ Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h}) = \left( \log(\boldsymbol{K}) - \log(\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} )) \right)^\mathrm{T} \boldsymbol{W}_K \left( \log(\boldsymbol{K}) - \log(\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} )) \right), $$ denote the (possibly weighted) residual sums of squares between measurements and modelled values. \(\boldsymbol{\theta}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) ) \) and \(\boldsymbol{K}( \boldsymbol{h}; \boldsymbol{\mu}, \boldsymbol{\nu} ) \) are vectors with modelled values of water content and hydraulic conductivity, and \(\boldsymbol{W}_\theta\) and \(\boldsymbol{W}_K\) are optional diagonal matrices with weights \(w_{\theta,i}\) and \(w_{K,j}\), respectively. The weights are products of case weights \(w^\prime_{\theta,i}\) and \(w^\prime_{K,j}\) and variable weights \(w_\theta\), \(w_K\), hence \( w_{\theta,i} = w_\theta\, w^\prime_{\theta,i} \) and \( w_{K,j} = w_K\, w^\prime_{K,j} \).
The objective function for ml and mpd estimation is equal to (Stewart et al., 1992, eqs 14 and 15, respectively) $$ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) = \frac{\kappa_\theta}{2} \log( Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h})) + \frac{\kappa_K}{2} \log( Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h})), $$ where \(\kappa_v = n_v + 2\) for mpd and \(\kappa_v = n_v\) for ml estimation, \(v \in (\theta, K)\), and weights \(w_{\theta,i} = w_{K,j} = 1\). Note that \( Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) \) is equal to the negative logarithm of the concentrated loglikelihood or the concentrated posterior density, obtained by replacing \(\sigma^2_\theta\) and \(\sigma^2_K\) by their conditional maximum likelihood or maximum density estimates \( \widehat{\sigma}^2_\theta(\boldsymbol{\mu}), \boldsymbol{\nu}) \) and \( \widehat{\sigma}^2_K(\boldsymbol{\mu}) \) respectively (Stewart et al., 1992, p. 642).
For wls the objective function is equal to $$ Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) = Q_{\theta}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{h}) + Q_{K}(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{K}, \boldsymbol{h}). $$ If either water content or conductivity data are not available, then the respective terms are omitted from \(Q(\boldsymbol{\mu}, \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) \).
The function fit_wrc_hcc does not attempt to estimate the parameters
by minimising
\(
Q(\boldsymbol{\mu}, \boldsymbol{\nu};
\boldsymbol{\theta}, \boldsymbol{K},
\boldsymbol{h})
\) directly with respect to
\(\boldsymbol{\mu}^\mathrm{} \) and \(\boldsymbol{\nu}^\mathrm{} \).
Rather, it exploits the fact that for given nonlinear parameters
\(\boldsymbol{\nu}^\mathrm{} \), the conditionally linear parameters
\(
\boldsymbol{\mu}^T = (\theta_r, \theta_s, \log(K_0))
\)
can be estimated straightforwardly
by minimising the conditional residual sums of squares
$$ Q_\theta^*(\theta_r, \theta_s; \boldsymbol{\theta}, \boldsymbol{h}, \boldsymbol{\nu}) =\left( \boldsymbol{\theta} - [ \boldsymbol{1}, \boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )] \, \left[ \begin{array}{c} \theta_r \\ \theta_s - \theta_r \end{array} \right] \right)^\mathrm{T} \boldsymbol{W}_\theta \left( \boldsymbol{\theta} - [ \boldsymbol{1}, \boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )] \, \left[ \begin{array}{c} \theta_r \\ \theta_s - \theta_r \end{array} \right] \right) $$ with respect to \(\theta_r\) and \(\theta_s-\theta_r\) and/or $$ Q_K^*(K_0; \boldsymbol{K}, \boldsymbol{h}, \boldsymbol{\nu})= \left( \log(\boldsymbol{K}) - \log(K_0 \, \boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )) \right)^\mathrm{T} \boldsymbol{W}_K \left( \log(\boldsymbol{K}) - \log(K_0 \, \boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )) \right), $$ with respect to \(\log(K_0)\), where \(\boldsymbol{1}^\mathrm{} \) is a vector of ones, \(\boldsymbol{S}( \boldsymbol{h}; \boldsymbol{\nu} )^\mathrm{T} = ( S(h_1; \boldsymbol{\nu}), ..., S(h_{n_\theta}; \boldsymbol{\nu}) ) \) and \(\boldsymbol{k}( \boldsymbol{h}; \boldsymbol{\nu} )^\mathrm{T} = ( k(h_1; \boldsymbol{\nu}), ..., k(h_{n_K}; \boldsymbol{\nu}) ) \) are vectors of modelled water saturation and modelled relative conductivity values, \(\theta_r\) and \(\theta_s\) are the residual and saturated water content, and \(K_0\) is the saturated hydraulic conductivity.
Unconstrained conditional estimates, say \( \widehat{\theta}_r(\boldsymbol{\nu}) \), \( \widehat{\theta}_s(\boldsymbol{\nu}) - \widehat{\theta}_r(\boldsymbol{\nu}) \) and \( \widehat{\log(K_0)}(\boldsymbol{\nu}) \) can be easily obtained from the normal equations of the respective (weighted) least squares problems, and quadratic programming yields conditional (weighted) least squares estimates that honour the inequality constraints \(0 \le \theta_r \le \theta_s \le 1\).
Let \(
\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu})^\mathrm{T} =
(
\widehat{\theta}_r(\boldsymbol{\nu}),
\widehat{\theta}_s(\boldsymbol{\nu}),
\widehat{\log(K_0)}(\boldsymbol{\nu})
)
\)
be the conditional estimates of the linear parameters
obtained by minimising
\(
Q_\theta^*(\theta_r, \theta_s;
\boldsymbol{\theta},
\boldsymbol{h}, \boldsymbol{\nu})
\),
and
\(
Q_K^*(K_0;
\boldsymbol{K},
\boldsymbol{h}, \boldsymbol{\nu})
\), respectively.
fit_wrc_hcc then estimates the
nonlinear parameters by minimising
\(
Q(
\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}),
\boldsymbol{\nu};
\boldsymbol{\theta}, \boldsymbol{K},
\boldsymbol{h})
\) with respect to \(\boldsymbol{\nu}^\mathrm{} \)
by a nonlinear optimisation algorithm.
For mpd and ml estimation the variances of the model errors are estimated by (Stewart et al., 1992, eq. 16) $$ \widehat{\sigma}_\theta^2 = \frac{Q_{\theta}( \widehat{\boldsymbol{\mu}}(\widehat{\boldsymbol{\nu}}), \widehat{\boldsymbol{\nu}}; \boldsymbol{\theta}, \boldsymbol{h})}{\kappa_\theta}, $$ and $$ \widehat{\sigma}_K^2 = \frac{Q_{K}( \widehat{\boldsymbol{\mu}}(\widehat{\boldsymbol{\nu}}), \widehat{\boldsymbol{\nu}}; \boldsymbol{K}, \boldsymbol{h})}{\kappa_K}. $$
Furthermore, for mpd and ml estimation, the covariance matrix of the estimated nonlinear parameters may be approximated by the inverse Hessian matrix of \( Q(\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h}) \) at the solution \( \widehat{\boldsymbol{\nu}} \) (Stewart and Sørensen, 1981), i.e.
$$ \mathrm{Cov}[ \widehat{\boldsymbol{\nu}}, \widehat{\boldsymbol{\nu}}^T] \approx \boldsymbol{A}^{-1}, $$ where $$ [\boldsymbol{A}]_{kl} = \frac{\partial^2}{\partial\nu_k\, \partial\nu_l} \left. Q(\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}), \boldsymbol{\nu}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h})\right|_{\nu=\hat{\nu}}. $$
Currently, fit_wrc_hcc allows to estimate the parameters of the
simplified form of the Van Genuchten-Mualem (VGM) model (Van
Genuchten, 1980) with the restriction \(m = 1 - \frac{1}{n}\), see wc_model and hc_model.
This model has the following parameters:
\(\boldsymbol{\mu}^\mathrm{T} = (\theta_r, \theta_s, K_0)\) (see above) and
\(\boldsymbol{\nu}^\mathrm{T} = (\alpha, n, \tau)\) where \(\alpha\) is the inverse air entry pressure, \(n\) the shape and \(\tau\) the tortuosity parameter.
Any of these parameters can be either estimated from data or kept
fixed at the specified initial values (see arguments param and
fit_param of fit_wrc_hcc).
Parameters of models for the water retention curve and the hydraulic
conductivity function may vary only within certain bounds (see
wc_model, hc_model and
param_boundf for allowed ranges).
fit_wrc_hcc either estimates transformed
parameters that vary over the whole real line and can therefore be
estimated without constraints (see param_transf), or it
uses algorithms (quadratic programming for estimating
\(\boldsymbol{\mu}^\mathrm{} \), nonlinear optimisation algorithms with box
constraints for estimating \(\boldsymbol{\nu}^\mathrm{} \)) that restrict estimates
to permissible ranges, see Details section of
control_fit_wrc_hcc.
In addition, for natural soils, the parameters of the VGM model
cannot vary independently from each other over the allowed ranges.
Sets of fitted parameters should always result in soil hydraulic
quantities that are physically meaningful. One of these quantities
is the characteristic length \(L_\mathrm{c}\) of
stage-I evaporation from a soil (Lehmann et al., 2008).
\(L_\mathrm{c}\) can be related to the parameters of the VGM
model, see Lehmann et al. (2008, 2020) and
evaporative-length.
Using several soil hydrological databases, Lehmann et al. (2020)
analysed the mutual dependence of VGM parameters and proposed
regression equations to relate the inverse air entry pressure
\(\alpha\) and the saturated hydraulic \(K_0\) to the shape
parameter \(n\), which characterises the width of the pore size
distribution of a soil. Using these relations, Lehmann et al.
(2020) then computed the expected value (``target'')
\(L_\mathrm{t}\) of \(L_\mathrm{c}\) for given \(n\)
and tortuosity parameter \(\tau\), see
evaporative-length. fit_wrc_hcc allows
to constrain estimates of the nonlinear parameters \(\boldsymbol{\nu}^\mathrm{} \)
by defining permissible lower and upper bounds for the ratio
\(L_\mathrm{c}/L_\mathrm{t}\), see arguments
ratio_lc_lt_bound of fit_wrc_hcc and
settings of control_fit_wrc_hcc.
To estimate \(\boldsymbol{\nu}^\mathrm{} \), fit_wrc_hcc minimises
\( Q(
\widehat{\boldsymbol{\mu}}(\boldsymbol{\nu}),
\boldsymbol{\nu};
\boldsymbol{\theta}, \boldsymbol{K},
\boldsymbol{h}) \)
either by a nonlinear optimisation algorithm available in the library
NLopt (Johnson, see nloptr) or by the Shuffled
Complex Evolution (SCE) optimisation algorithm (Duan et al., 1994,
see SCEoptim). The choice of the algorithm
is controlled by the argument settings of the function
control_fit_wrc_hcc:
global optimisation without constraints for the
ratio \(L_\mathrm{c}/L_\mathrm{t}\)
(settings =
"uglobal" or settings = "sce"),
global optimisation with inequality constraints
for the ratio \(L_\mathrm{c}/L_\mathrm{t}\)
(settings = "cglobal"),
local optimisation without constraints for the
ratio \(L_\mathrm{c}/L_\mathrm{t}\)
(settings =
"ulocal"),
local optimisation with inequality constraints
for the ratio \(L_\mathrm{c}/L_\mathrm{t}\)
(settings = "clocal").
The settings argument also sets reasonable default values
for the termination (= convergence) criteria for the various
algorithms, see
NLopt
documentation, section Termination conditions. The NLopt
documentation contains a very useful discussion of
(constrained)
optimisation problems in general,
global
vs. local optimisation and
gradient-based
vs. derivative-free algorithms.
Note that besides the settings argument of
control_fit_wrc_hcc, the arguments nloptr and
sce along with the functions control_nloptr and
control_sce allow to fully control the nonlinear
optimisation algorithms, see control_fit_wrc_hcc for
details.
For local optimisation algorithms ``good'' initial values of
\(\boldsymbol{\nu}^\mathrm{} \) are indispensable for successful estimation.
fit_wrc_hcc allows to compute initial values of
\(\alpha\) and \(n\) for the Van Genuchten model from water
retention data by the following procedure:
Smooth the water retention data, \( (\theta_i, y_i= \log(h_i)), i=1,2, ... n_\theta, \) by an additive model.
Determine the saturation, \(S^*\), and the logarithm of capillary pressure head, \(y^* = \log(h^*)\), at the inflection point of the additive model fit.
Find the root, say \(\widehat{m}\), of
\(S^* = (1 + 1/m)^{-m}\).
One obtains the right-hand side of this equation by solving
\(\frac{\partial^2}{\partial y^2}
\left[
S_\mathrm{VG}(\exp(y); \boldsymbol{\nu})
\right] = 0
\)
for \(y\) and plugging the result into the expression for
\(
S_\mathrm{VG}(\exp(y); \boldsymbol{\nu}),
\)
see wc_model.
Compute \( \widehat{n} = 1 / (1 - \widehat{m}) \) and \( \widehat{\alpha} = 1 / \exp(y^*) \,(1/\widehat{m})^{1-\widehat{m}}. \) The second expression is again a result of solving \(\frac{\partial^2}{\partial y^2} \left[ S_\mathrm{VG}(\exp(y); \boldsymbol{\nu}) \right] = 0. \)
Initial values for local optimisation algorithms can of course also
be obtained by first estimating the parameters by a global
algorithm. These estimates can be ``refined'' in a second
step by a local unconstrained algorithm, possibly
followed by a third call of fit_wrc_hcc to constrain
the estimated parameters by the ratio
\(L_\mathrm{c}/L_\mathrm{t}\). The method
coef.fit_wrc_hcc can be used to extract the estimated
parameters from an object of class fit_wrc_hcc and to pass
them as initial values to fit_wrc_hcc, see
fit_wrc_hcc for examples.
Bates, D. M., and Watts, D. G. (1988) Nonlinear Regression Analysis and its Applications. John Wiley & Sons, New York tools:::Rd_expr_doi("10.1002/9780470316757").
Duan, Q., Sorooshian, S., and Gupta, V. K. (1994) Optimal use of the SCE-UA global optimisation method for calibrating watershed models, Journal of Hydrology 158, 265--284, tools:::Rd_expr_doi("10.1016/0022-1694(94)90057-4").
Johnson, S.G. The NLopt nonlinear-optimisation package. https://github.com/stevengj/nlopt.
Lehmann, P., Assouline, S., Or, D. (2008) Characteristic lengths affecting evaporative drying of porous media. Physical Review E, 77, 056309, tools:::Rd_expr_doi("10.1103/PhysRevE.77.056309").
Lehmann, P., Bickel, S., Wei, Z., Or, D. (2020) Physical Constraints for Improved Soil Hydraulic Parameter Estimation by Pedotransfer Functions. Water Resources Research 56, e2019WR025963, tools:::Rd_expr_doi("10.1029/2019WR025963").
Stewart, W.E., Caracotsios, M. Sørensen, J.P. 1992. Parameter estimation from multiresponse data. AIChE Journal, 38, 641--650, tools:::Rd_expr_doi("10.1002/aic.690380502").
Stewart, W.E. and Sørensen, J.P. (1981)
Bayesian estimation of common
parameters from multiresponse data with missing observations.
Technometrics, 23, 131--141,
tools:::Rd_expr_doi("10.1080/00401706.1981.10486255").
Van Genuchten, M. Th. (1980) A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Science Society of America Journal, 44, 892--898, tools:::Rd_expr_doi("10.2136/sssaj1980.03615995004400050002x").
fit_wrc_hcc for (constrained) estimation of parameters of
models for soil water retention and hydraulic conductivity data;
control_fit_wrc_hcc for options to control
fit_wrc_hcc;
soilhypfitmethods for common S3 methods for class
fit_wrc_hcc;
vcov for computing (co-)variances of the estimated
nonlinear parameters;
prfloglik_sample for profile loglikelihood
computations;
wc_model and hc_model for currently
implemented models for soil water retention curves and hydraulic
conductivity functions;
evaporative-length for physically constraining parameter
estimates of soil hydraulic material functions.