norm_p12_cp: Normal Distribution with Predictors on both Mean and Standard Deviation, with Parameter Uncertainty

Description

The fitdistcp package contains functions that generate predictive distributions for various statistical models, with and without parameter uncertainty. Parameter uncertainty is included by using Bayesian prediction with a type of objective prior known as a calibrating prior. Calibrating priors are chosen to give predictions that give good reliability (i.e., are well calibrated), for any underlying true parameter values.

There are five functions for each model, each of which uses training data x. For model **** the five functions are as follows:

q****_cp returns predictive quantiles at the specified probabilities p, and various other diagnostics.
r****_cp returns n random deviates from the predictive distribution.
d****_cp returns the predictive density function at the specified values y
p****_cp returns the predictive distribution function at the specified values y
t****_cp returns n random deviates from the posterior distribution of the model parameters.

The q, r, d, p routines return two sets of results, one based on maximum likelihood, and the other based on a calibrating prior. The prior used depends on the model, and is given under Details below.

Where possible, the Bayesian prediction integral is solved analytically. Otherwise, DMGS asymptotic expansions are used. Optionally, a third set of results is returned that integrates the prediction integral by sampling the parameter posterior distribution using the RUST rejection sampling algorithm.

Usage

qnorm_p12_cp(
  x,
  t1,
  t2,
  t01 = NA,
  t02 = NA,
  n01 = NA,
  n02 = NA,
  p = seq(0.1, 0.9, 0.1),
  ics = c(0, 0, 0, 0),
  means = FALSE,
  waicscores = FALSE,
  logscores = FALSE,
  dmgs = TRUE,
  rust = FALSE,
  nrust = 1e+05,
  extramodels = FALSE,
  predictordata = TRUE,
  centering = TRUE,
  debug = FALSE
)
rnorm_p12_cp(
  n,
  x,
  t1,
  t2,
  n01 = NA,
  n02 = NA,
  t01 = NA,
  t02 = NA,
  ics = c(0, 0, 0, 0),
  rust = FALSE,
  mlcp = TRUE,
  debug = FALSE
)
dnorm_p12_cp(
  x,
  t1,
  t2,
  t01 = NA,
  t02 = NA,
  n01 = NA,
  n02 = NA,
  y = x,
  ics = c(0, 0, 0, 0),
  rust = FALSE,
  nrust = 1000,
  boot = FALSE,
  nboot = 10,
  centering = TRUE,
  rnonnegslopesonly = FALSE,
  debug = FALSE
)
pnorm_p12_cp(
  x,
  t1,
  t2,
  t01 = NA,
  t02 = NA,
  n01 = NA,
  n02 = NA,
  y = x,
  ics = c(0, 0, 0, 0),
  rust = FALSE,
  nrust = 1000,
  boot = FALSE,
  nboot = 10,
  centering = TRUE,
  rnonnegslopesonly = FALSE,
  debug = FALSE
)
tnorm_p12_cp(
  method,
  n,
  x,
  t1,
  t2,
  nonnegslopesonly = FALSE,
  ics = c(0, 0, 0, 0),
  debug = FALSE
)

Value

q**** returns a list containing at least the following:

ml_params: maximum likelihood estimates for the parameters.
ml_value: the value of the log-likelihood at the maximum.
standard_errors: estimates of the standard errors on the parameters, from the inverse observed information matrix.
ml_quantiles: quantiles calculated using maximum likelihood.
cp_quantiles: predictive quantiles calculated using a calibrating prior.
maic: the AIC score for the maximum likelihood model, times -1/2.
cp_method: a comment about the method used to generate the cp prediction.

For models with predictors, q**** additionally returns:

predictedparameter: the estimated value for parameter, as a function of the predictor.
adjustedx: the detrended values of x

r**** returns a list containing the following:

ml_params: maximum likelihood estimates for the parameters.
ml_deviates: random deviates calculated using maximum likelihood.
cp_deviates: predictive random deviates calculated using a calibrating prior.
cp_method: a comment about the method used to generate the cp prediction.

d**** returns a list containing the following:

ml_params: maximum likelihood estimates for the parameters.
ml_pdf: density function from maximum likelihood.
cp_pdf: predictive density function calculated using a calibrating prior (not available in EVT routines, for mathematical reasons, unless using RUST).
cp_method: a comment about the method used to generate the cp prediction.

p*** returns a list containing the following:

ml_params: maximum likelihood estimates for the parameters.
ml_cdf: distribution function from maximum likelihood.
cp_cdf: predictive distribution function calculated using a calibrating prior (not available in EVT routines, for mathematical reasons, unless using RUST).
cp_method: a comment about the method used to generate the cp prediction.

t*** returns a list containing the following:

theta_samples: random samples from the parameter posterior.

Arguments

x: a vector of training data values
t1: a vector of predictors for the mean, such that length(t1)=length(x)
t2: a vector of predictors for the sd, such that length(t2)=length(x)
t01: a single value of the predictor (specify either t01 or n01 but not both)
t02: a single value of the predictor (specify either t02 or n02 but not both)
n01: an index for the predictor (specify either t01 or n01 but not both)
n02: an index for the predictor (specify either t02 or n02 but not both)
p: a vector of probabilities at which to generate predictive quantiles
ics: initial conditions for the maximum likelihood search
means: logical that indicates whether to run additional calculations and return analytical estimates for the distribution means (longer runtime)
waicscores: logical that indicates whether to run additional calculations and return estimates for the WAIC1 and WAIC2 scores (longer runtime)
logscores: logical that indicates whether to run additional calculations and return leave-one-out estimates of the log-score (much longer runtime, non-EVT models only)
dmgs: logical that indicates whether DMGS calculations should be run or not (longer run time)
rust: logical that indicates whether RUST-based posterior sampling calculations should be run or not (longer run time)
nrust: the number of posterior samples used in the RUST calculations
extramodels: logical that indicates whether to run additional calculations and add three additional prediction models (longer runtime)
predictordata: logical that indicates whether predictordata should be calculated
centering: logical that indicates whether the predictor should be centered
debug: logical for turning on debug messages
n: the number of random samples required
mlcp: logical that indicates whether maxlik and parameter uncertainty calculations should be performed (turn off to speed up RUST)
y: a vector of values at which to calculate the density and distribution functions
boot: logical that indicates whether bootstrap-based posterior sampling calculations should be run or not (longer run time)
nboot: the number of posterior samples used in the bootstrap calculations
rnonnegslopesonly: logical that indicates whether to disallow non-negative slopes
method: character string that indicates whether to use rust method=rust or bootstrap method=boot
nonnegslopesonly: logical that indicates whether to disallow non-negative slopes

Details of the Model

The normal distribution with predictors on both parameters has probability density function $$f(x;\alpha,\beta,\gamma,\delta) =\frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu(\alpha,\beta))^2/(2\sigma(\gamma,\delta)^2)}$$ where $x$ is the random variable, $\mu=\alpha+\beta t_1$ is the location parameter, modelled as a function of parameters $\alpha,\beta$ and predictor $t_1$, where $t_1$ is typically the ensemble mean, and $\sigma=\exp(\gamma+\delta \log (t_2))$ is the scale parameter, modelled as a function of parameters $\gamma,\delta$ and predictor $t_2$, where $t_2$ is typically the ensemble spread.

The calibrating prior is given by the right Haar prior, which is $$\pi(\alpha,\beta,\gamma,\delta) \propto \frac{1}{\sigma}$$ as given in the Jewson et al. (2025) reference given below.

Optional Return Values

q**** optionally returns the following:

If rust=TRUE:

ru_quantiles: predictive quantiles calculated using a calibrating prior, using posterior sampling with the RUST algorithm, based on inverting an empirical CDF based on nrust samples.

If waicscores=TRUE:

waic1: the WAIC1 score for the calibrating prior model.
waic2: the WAIC2 score for the calibrating prior model.

If logscores=TRUE:

ml_oos_logscore: the leave-one-out logscore for the maximum likelihood prediction (not available in EVT routines, for mathematical reasons)
cp_oos_logscore: the leave-one-out logscore for the parameter uncertainty model available in EVT routines, for mathematical reasons)

If means=TRUE:

ml_mean: analytic estimate of the mean of the MLE predictive distribution, where possible
cp_mean: analytic estimate of the mean of the calibrating prior predictive distribution, where mathematically possible. Can be compared with the mean estimated from random deviates.

r**** optionally returns the following:

If rust=TRUE:

ru_deviates: nrust predictive random deviatives calculated using a calibrating prior, using posterior sampling with RUST.

d**** optionally returns the following:

If rust=TRUE:

ru_pdf: predictive density calculated using a calibrating prior, using posterior sampling with RUST, averaging over nrust density functions.

p**** optionally returns the following:

If rust=TRUE:

ru_cdf: predictive probability calculated using a calibrating prior, using posterior sampling with RUST, averaging over nrust distribution functions.

Selecting these additional outputs increases runtime. They are optional so that runtime for the basic outputs is minimised. This facilitates repeated experiments that evaluate reliability over many thousands of repeats.

Details (homogeneous models)

This model is a homogeneous model, and the cp results are based on the right Haar prior. For homogeneous models (models with sharply transitive transformation groups), a Bayesian prediction based on the right Haar prior gives exact reliability, as shown by Severini et al. (2002), even when the true parameters are unknown. This means that probabilities in the prediction will correspond to frequencies of future outcomes in repeated trials (if the model is correct).

Maximum likelihood prediction does not give reliable predictions, even when the model is correct, because it does not account for parameter uncertainty. In particular, maximum likelihood predictions typically underestimate the tail in repeated trials.

The reliability of the maximum likelihood and the calibrating prior predictive quantiles produced by the q****_cp routines in fitdistcp can be quantified using repeated simulations with the routine reltest.

Details (DMGS integration)

For this model, the Bayesian prediction equation cannot be solved analytically, and is approximated using the DMGS asymptotic expansions given by Datta et al. (2000). This approximation seems to work well for medium and large sample sizes, but may not work well for small sample sizes (e.g., <20 data points). For small sample sizes, it may be preferable to use posterior sampling by setting rust=TRUE and looking at the ru outputs. The performance for any sample size, in terms of reliability, can be tested using reltest.

Author

Stephen Jewson stephen.jewson@gmail.com

References

If you use this package, we would be grateful if you would cite the following reference, which introduces this model.

Jewson S., Olivetti L., Messori G., Northop P., Sweeting T. (2025): An Objective Bayesian Method for Including Parameter Uncertainty in Ensemble Model Output Statistics; QJRMS (Quarterly Journal of the Royal Meteorological Society).

Examples

Run this code

#
# example 1
x=fitdistcp::d060norm_p1_example_data_v1_x
tt=fitdistcp::d060norm_p1_example_data_v1_t
p=c(1:9)/10
n0=10
q=qnorm_p12_cp(x,t1=tt,t2=tt,n01=n0,n02=n0,p=p,rust=TRUE,nrust=1000)
xmin=min(q$ml_quantiles,q$cp_quantiles);
xmax=max(q$ml_quantiles,q$cp_quantiles);
plot(q$ml_quantiles,p,xlab="quantile estimates",xlim=c(xmin,xmax),
	sub="(from qnorm_p12_cp)",
	main="Normal w/ p1: quantile estimates");
points(q$cp_quantiles,p,col="red",lwd=2)
points(q$ru_quantiles,p,col="blue")

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Details of the Model

Optional Return Values

Details (homogeneous models)

Details (DMGS integration)

Author

References

See Also

Examples