brglmFit: Fitting function for `glm` for reduced-bias estimation and inference

Description

brglmFit is a fitting method for glm that fits generalized linear models using implicit and explicit bias reduction methods (Kosmidis, 2014), and other penalized maximum likelihood methods. Currently supported methods include the mean bias-reducing adjusted scores approach in Firth (1993) and Kosmidis & Firth (2009), the median bias-reduction adjusted scores approach in Kenne Pagui et al. (2017), the correction of the asymptotic bias in Cordeiro & McCullagh (1991), the mixed bias-reduction adjusted scores approach in Kosmidis et al (2019), maximum penalized likelihood with powers of the Jeffreys prior as penalty, and maximum likelihood. Estimation is performed using a quasi Fisher scoring iteration (see vignette("iteration", "brglm2")), which, in the case of mean-bias reduction, resembles an iterative correction of the asymptotic bias of the Fisher scoring iterates.

Usage

brglmFit(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  fixed_totals = NULL,
  singular.ok = TRUE
)
brglm_fit(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  fixed_totals = NULL,
  singular.ok = TRUE
)

Arguments

x is a design matrix of dimension n * p.

y is a vector of observations of length n.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

start

starting values for the parameters in the linear predictor. If NULL (default) then the maximum likelihood estimates are calculated and used as starting values.

etastart

applied only when start is not NULL. Starting values for the linear predictor to be passed to glm.fit when computing starting values using maximum likelihood.

mustart

applied only when start is not NULL. Starting values for the vector of means to be passed to glm.fit when computing starting values using maximum likelihood.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

control

a list of parameters controlling the fitting process. See brglmControl for details.

intercept

logical. Should an intercept be included in the null model?

fixed_totals

effective only when family is poisson. Either NULL (no effect) or a vector that indicates which counts must be treated as a group. See Details for more information and brmultinom.

singular.ok

logical. If FALSE, a singular model is an error.

Details

A detailed description of the supported adjustments and the quasi Fisher scoring iteration is given in the iteration vignette (see, vignette("iteration", "brglm2") or Kosmidis et al, 2019). A shorter description of the quasi Fisher scoring iteration is also given in one of the vignettes of the *enrichwith* R package (see, https://cran.r-project.org/package=enrichwith/vignettes/bias.html). Kosmidis and Firth (2010) describe a parallel quasi Newton-Raphson iteration with the same stationary point.

In the special case of generalized linear models for binomial, Poisson and multinomial responses, the adjusted score equations approach returns estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation in multinomial regression). See, also, detect_separation and check_infinite_estimates for pre-fit and post-fit methods for the detection of infinite estimates in binomial response generalized linear models.

The type of score adjustment to be used is specified through the type argument (see brglmControl for details). The available options are

type = "AS_mixed": the mixed bias-reducing score adjustments in Kosmidis et al (2019) that result in mean bias reduction for the regression parameters and median bias reduction for the dispersion parameter, if any; default.
type = "AS_mean": the mean bias-reducing score adjustments in Firth, 1993 and Kosmidis & Firth, 2009. type = "AS_mixed" and type = "AS_mean" will return the same results when family is binomial or poisson, i.e. when the dispersion is fixed)
type = "AS_median": the median-bias reducing score adjustments in Kenne Pagui et al. (2017)
type = "MPL_Jeffreys": maximum penalized likelihood with powers of the Jeffreys prior as penalty.
type = "ML": maximum likelihood
type = "correction": asymptotic bias correction, as in Cordeiro & McCullagh (1991).

The null deviance is evaluated based on the fitted values using the method specified by the type argument (see brglmControl).

The family argument of the current version of brglmFit can accept any combination of family objects and link functions, including families with user-specified link functions, mis links, and power links, but excluding quasi, quasipoisson and quasibinomial families.

The description of method argument and the Fitting functions section in glm gives information on supplying fitting methods to glm.

fixed_totals to specify groups of observations for which the sum of the means of a Poisson model will be held fixed to the observed count for each group. This argument is used internally in brmultinom and bracl for baseline-category logit models and adjacent category logit models, respectively.

brglm_fit is an alias to brglmFit.

References

Kosmidis I and Firth D (2019). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. *arXiv e-prints*, arXiv:1812.01938 <URL: http://arxiv.org/abs/1812.01938>.

Kosmidis I, Kenne Pagui EC, Sartori N (2019). Mean and median bias reduction in generalized linear models. *arXiv e-prints*, arXiv:1804.04085. To appear in Statistics and Computing, <URL: https://arxiv.org/abs/1804.04085>.

Cordeiro GM & McCullagh, P (1991). Bias correction in generalized linear models. *Journal of the Royal Statistical Society. Series B (Methodological)*, **53**, 629-643

Firth D (1993). Bias reduction of maximum likelihood estimates, Biometrika. **80**, 27-38

Kenne Pagui EC, Salvan A, and Sartori N (2017). Median bias reduction of maximum likelihood estimates. *Biometrika*, **104**, 923<U+2013>938

Kosmidis I and Firth D (2009). Bias reduction in exponential family nonlinear models. *Biometrika*, **96**, 793-804

Kosmidis I and Firth D (2010). A generic algorithm for reducing bias in parametric estimation. *Electronic Journal of Statistics*, **4**, 1097-1112

Kosmidis I (2014). Bias in parametric estimation: reduction and useful side-effects. *WIRE Computational Statistics*, **6**, 185-196

Examples

Run this code

# NOT RUN {
## The lizards example from ?brglm::brglm
data("lizards")
# Fit the model using maximum likelihood
lizardsML <- glm(cbind(grahami, opalinus) ~ height + diameter +
                 light + time, family = binomial(logit), data = lizards,
                 method = "glm.fit")
# Mean bias-reduced fit:
lizardsBR_mean <- glm(cbind(grahami, opalinus) ~ height + diameter +
                      light + time, family = binomial(logit), data = lizards,
                      method = "brglmFit")
# Median bias-reduced fit:
lizardsBR_median <- glm(cbind(grahami, opalinus) ~ height + diameter +
                        light + time, family = binomial(logit), data = lizards,
                        method = "brglmFit", type = "AS_median")
summary(lizardsML)
summary(lizardsBR_median)
summary(lizardsBR_mean)

# Maximum penalized likelihood with Jeffreys prior penatly
lizards_Jeffreys <- glm(cbind(grahami, opalinus) ~ height + diameter +
                        light + time, family = binomial(logit), data = lizards,
                        method = "brglmFit", type = "MPL_Jeffreys")
# lizards_Jeffreys is the same fit as lizardsBR_mean (see Firth, 1993)
all.equal(coef(lizardsBR_mean), coef(lizards_Jeffreys))

# Maximum penalized likelihood with powers of the Jeffreys prior as
# penalty. See Kosmidis & Firth (2019) for the finiteness and
# shrinkage properties of the maximum penalized likelihood
# estimators in binomial response models
# }
# NOT RUN {
a <- seq(0, 20, 0.5)
coefs <- sapply(a, function(a) {
      out <- glm(cbind(grahami, opalinus) ~ height + diameter +
             light + time, family = binomial(logit), data = lizards,
             method = "brglmFit", type = "MPL_Jeffreys", a = a) 
      coef(out)
})
# Illustration of shrinkage as a grows
matplot(a, t(coefs), type = "l", col = 1, lty = 1)
abline(0, 0, col = "grey")
# }
# NOT RUN {
## Another example from
## King, Gary, James E. Alt, Nancy Elizabeth Burns and Michael Laver
## (1990).  "A Unified Model of Cabinet Dissolution in Parliamentary
## Democracies", _American Journal of Political Science_, **34**, 846-870

# }
# NOT RUN {
data("coalition", package = "brglm2")
# The maximum likelihood fit with log link
coalitionML <- glm(duration ~ fract + numst2, family = Gamma, data = coalition)
# The mean bias-reduced fit
coalitionBR_mean <- update(coalitionML, method = "brglmFit")
# The bias-corrected fit
coalitionBC <- update(coalitionML, method = "brglmFit", type = "correction")
# The median bias-corrected fit
coalitionBR_median <- update(coalitionML, method = "brglmFit", type = "AS_median")
# }
# NOT RUN {
# }
# NOT RUN {
## An example with offsets from Venables & Ripley (2002, p.189)
data("anorexia", package = "MASS")

anorexML <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                family = gaussian, data = anorexia)
anorexBC <- update(anorexML, method = "brglmFit", type = "correction")
anorexBR_mean <- update(anorexML, method = "brglmFit")
anorexBR_median <- update(anorexML, method = "brglmFit", type = "AS_median")

## All methods return the same estimates for the regression
## parameters because the maximum likelihood estimator is normally
## distributed around the `true` value under the model (hence, both
## mean and component-wise median unbiased). The Wald tests for
## anorexBC and anorexBR_mean differ from anorexML
## because the bias-reduced estimator of the dispersion is the
## unbiased, by degree of freedom adjustment (divide by n - p),
## estimator of the residual variance. The Wald tests from
## anorexBR_median are based on the median bias-reduced estimator
## of the dispersion that results from a different adjustment of the
## degrees of freedom (divide by n - p - 2/3)
summary(anorexML)
summary(anorexBC)
summary(anorexBR_mean)
summary(anorexBR_median)
# }
# NOT RUN {
## endometrial data from Heinze & Schemper (2002) (see ?endometrial)
data("endometrial", package = "brglm2")
endometrialML <- glm(HG ~ NV + PI + EH, data = endometrial,
                     family = binomial("probit"))
endometrialBR_mean <- update(endometrialML, method = "brglmFit",
                             type = "AS_mean")
endometrialBC <- update(endometrialML, method = "brglmFit",
                        type = "correction")
endometrialBR_median <- update(endometrialML, method = "brglmFit",
                               type = "AS_median")
summary(endometrialML)
summary(endometrialBC)
summary(endometrialBR_mean)
summary(endometrialBR_median)

# }