A detection test for the Hauck-Donner effect on each regression coefficient of a VGLM regression or 2 x 2 table.

```
hdeff(object, ...)
hdeff.vglm(object, derivative = NULL, se.arg = FALSE, subset = NULL,
theta0 = 0, hstep = 0.005, fd.only = FALSE, ...)
hdeff.numeric(object, byrow = FALSE, ...)
hdeff.matrix(object, ...)
```

object

Usually a `vglm`

object.
Although only a limited number of family functions have
an analytical solution to
the HDE detection test
(`binomialff`

,
`borel.tanner`

,
`cumulative`

,
`erlang`

,
`felix`

,
`lindley`

,
`poissonff`

,
`topple`

,
`uninormal`

,
`zipoissonff`

,
and
`zipoisson`

;
hopefully some more will be implemented in the short future!)
the finite-differences (FDs) method can be applied to almost all
VGAM family functions to get a numerical solution.

Alternatively `object`

may represent a 2 x 2 table of
*positive* counts.
If so, then the first row corresponds to \(x2=0\) (baseline group)
and the second row \(x2=1\). The first column
corresponds to \(y=0\) (failure) and
the second column \(y=1\) (success).

Another alternative is that `object`

is a numerical vector
of length 4, representing a 2 x 2 table of *positive* counts.
If so then it is fed into `hdeff.matrix`

using
the argument `byrow`

, which matches `matrix`

.
See the examples below.

derivative

Numeric. Either 1 or 2.
Currently only a few models having one linear predictor are handled
analytically for `derivative = 2`

, e.g.,
`binomialff`

,
`poissonff`

.
However, the numerical method can return the first two
derivatives for almost all models.

se.arg

Logical. If `TRUE`

then the derivatives of the standard errors
are returned as well, because usually the derivatives of the
Wald statistics are of central interest.
Requires `derivative`

to be assigned the value 1 or 2
for this argument to operate.

subset

Logical or vector of indices,
to select the regression coefficients of interest.
The default is to select all coefficients.
Recycled if necessary if logical.
If numeric then they should comprise
elements from `1:length(coef(object))`

.
This argument can be useful for computing the derivatives
of a Cox regression (`coxph`

) fitted using
artificially created Poisson data; then there are
many coefficients that are effectively nuisance parameters.

theta0

Numeric. Vector recycled to the necessary length which is
the number of regression coefficients.
The null hypotheses for the regression coefficients are that
they equal those respective values, and the alternative
hypotheses are all two-sided.
It is not recommended that argument `subset`

be used
if a vector of values is assigned here because
`theta0[subset]`

is implied and might not work.

hstep

Positive numeric and recycled to length 2;
it is the so-called *step size* when using
finite-differences and is often called \(h\) in the calculus
literature,
e.g., \(f'(x)\) is approximately \((f(x+h) - f(x)) / h\).
For the 2nd-order partial derivatives, there are two step sizes
and hence this argument is recycled to length 2.
The default is to have the same values.
The 1st-order derivatives use the first value only.
It is recommended that a few values of this argument be tried
because values of the first and second derivatives can
vary accordingly.
If any values are too large then the derivatives may be inaccurate;
and if too small then the derivatives may be unstable and
subject to too much round-off/cancellation error
(in fact it may create an error or a `NA`

).

fd.only

Logical;
if `TRUE`

then finite-differences are used to estimate
the derivatives even if an analytical solution has been coded,
By default, finite-differences will be used
when an analytical solution has not been implemented.

It is possible that `NA`

s are returned.
If so, and if `fd.only = FALSE`

, then a warning
is issued and a recursive
call is made with `fd.only = TRUE`

---this is more
likely to return an answer without any `NA`

s.

byrow

Logical;
fed into `matrix`

if `object`

is a
vector of length 4 so that there are two choices in
the order of the elements.

…

currently unused but may be used in the future for further arguments passed into the other methods functions.

By default this function returns a labelled logical vector;
a `TRUE`

means the HDE is affirmative for that coefficient
(negative slope).
Hence ideally all values are `FALSE`

.
Any `TRUE`

values suggests that the MLE is
too near the boundary of the parameter space,
and that the p-value for that regression coefficient
is biased upwards.
When present
a highly significant variable might be deemed nonsignificant,
and thus the HDE can create havoc for variable selection.
If the HDE is present then more accurate
p-values can generally be obtained by conducting a
likelihood ratio test
(see `lrt.stat.vlm`

)
or Rao's score test
(see `score.stat.vlm`

);
indeed the default of
`wald.stat.vlm`

does not suffer from the HDE.

Setting `deriv = 1`

returns a numerical vector of first
derivatives of the Wald statistics.
Setting `deriv = 2`

returns a 2-column matrix of first
and second derivatives of the Wald statistics.
Then
setting `se.arg = TRUE`

returns an additional 1 or 2 columns.

Some 2nd derivatives are `NA`

if
only a partial analytic solution has been programmed in.

For those VGAM family functions whose HDE test has not yet
been implemented explicitly (the vast majority of them),
finite-difference approximations
to the derivatives will be used---see the arguments
`hstep`

and `fd.only`

for getting some control on them.

Almost all of statistical inference based on the likelihood assumes that the parameter estimates are located in the interior of the parameter space. The nonregular case of being located on the boundary is not considered very much and leads to very different results from the regular case. Practically, an important question is: how close is close to the boundary? One might answer this as: the parameter estimates are too close to the boundary when the Hauck-Donner effect (HDE) is present, whereby the Wald statistic becomes aberrant.

Hauck and Donner (1977) first observed an aberration of the Wald test
statistic not monotonically increasing as a function of increasing
distance between the parameter estimate and the null value. This
"disturbing" and "undesirable" underappreciated effect has since been
observed in other regression models by various authors. This function
computes the first, and possibly second, derivative of the Wald
statistic for each regression coefficient. A negative value of the
first derivative is indicative of the HDE being present.
More information can be obtained from `hdeffsev`

regarding HDE severity: there may be none,
faint, weak, moderate, strong and extreme amounts of HDE
present.

In general, most models have derivatives that are computed
numerically using finite-difference
approximations. The reason is that it takes a lot of work
to program in the analytical solution
(this includes a few very common models, such as
`poissonff`

and
`binomialff`

,
where the first two derivatives have been implemented).

Hauck, J. W. W. and A. Donner (1977).
Wald's test as applied to hypotheses in logit analysis.
*Journal of the American Statistical Association*,
**72**, 851--853.
Corrigenda: JASA, **75**, 482.

Yee, T. W. (2022)
On the Hauck-Donner effect in Wald tests:
Detection, tipping points and parameter space characterization,
*Journal of the American Statistical Association*, in press.

Yee, T. W. (2021).
Some new results concerning the Hauck-Donner effect.
*Manuscript in preparation*.

`summaryvglm`

,
`hdeffsev`

,
`vglm`

,
`lrt.stat`

,
`score.stat`

,
`wald.stat`

,
`confintvglm`

,
`profilevglm`

.

```
# NOT RUN {
pneumo <- transform(pneumo, let = log(exposure.time))
fit <- vglm(cbind(normal, mild, severe) ~ let, data = pneumo,
trace = TRUE, crit = "c", # Get some more accuracy
cumulative(reverse = TRUE, parallel = TRUE))
cumulative()@infos()$hadof # Analytical solution implemented
hdeff(fit)
hdeff(fit, deriv = 1) # Analytical solution
hdeff(fit, deriv = 2) # It is a partial analytical solution
hdeff(fit, deriv = 2, se.arg = TRUE,
fd.only = TRUE) # All derivatives solved numerically by FDs
# 2 x 2 table of counts
R0 <- 25; N0 <- 100 # Hauck Donner (1977) data set
mymat <- c(N0-R0, R0, 8, 92) # HDE present
(mymat <- matrix(mymat, 2, 2, byrow = TRUE))
hdeff(mymat)
hdeff(c(mymat)) # Input is a vector
hdeff(c(t(mymat)), byrow = TRUE) # Reordering of the data
# }
```

Run the code above in your browser using DataCamp Workspace