mbrglm (version 0.0.1)

mbrglm:

Description

Fits binomial-response GLMs using the median bias-reduction method proposed in Kenne Pagui et al. (2016, Section 3). The proposed method is obtained by modifying the score equation in such a way that the solution is an approximately median unbiased estimator for each parameter component. The median bias-reduction method enjoys several good properties with respect to the maximum likelihood. In particular, the resulting estimator is component-wise median unbiased with and error of order (\(\mathop{\rm O}(n^{-1})\)) and is equivariant under joint reparameterizations that transform each parameter component separately. It has the same asymptotic distribution as the maximum likelihood estimator. Moreover, the resulting estimates and their corresponding standard errors are always finite while the maximum likelihood estimates can be infinite in situations where complete or quasi separation occurs.

Usage

mbrglm(formula, family = binomial, data, weights, subset, na.action, start = NULL, 
 etastart, mustart, offset, model = TRUE, method = "mbrglm.fit", x = FALSE, 
  y = TRUE, contrasts = NULL, control.glm = glm.control(), 
  control.mbrglm = mbrglm.control(), ...)

mbrglm.fit(x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = binomial(), control = glm.control(), control.mbrglm = mbrglm.control(), intercept = TRUE)

Arguments

formula
an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted.
family
a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For mbrglm.fit only the third option is supported. (See family for details of family functions.) mbrglm currently supports only the "binomial" family with links "logit", "probit", "cloglog".
data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.
weights
an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.
subset
an optional vector specifying a subset of observations to be used in the fitting process.
na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The 'factory-fresh' default is na.omit. Another possible value is NULL, no action. Value na.excludecan be useful.
start
starting values for the parameters in the linear predictor.
etastart
starting values for the linear predictor.
mustart
starting values for the vector of means.
offset
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.
control
a list of parameters for controlling the fitting process. For glm.fit this is passed to glm.control.
intercept
logical. Should an intercept be included in the null model?
model
a logical value indicating whether model frame should be included as a component of the returned value.
method
the method to be used for fitting the model. The unique method is "mbrglm.fit", which uses the median modified score function to estimate the parameters.
x
For mbrglm: logical values indicating whether the model matrix used in the fitting process should be returned as components of the returned value.
y
For mbrglm: logical values indicating whether the response vector used in the fitting process should be returned as components of the returned value.
contrasts
an optional list. See the contrasts.arg of model.matrix.default.
control.glm
control.glm replaces the control argument in glm but essentially does the same job. It is a list of parameters to control glm.fit. See the documentation of glm.control1 for details.
control.mbrglm
a list of parameters for controlling the fitting process when method="mbrglm.fit". See documentation mbrglm.control for details.
...
additional arguments passed to or from other methods.

Value

mbrglm returns an object of class "mbrglm". A "mbrglm" object inherits first from "glm" and then from "lm" and is a list containing the following components:
coefficients
a named vector of coefficients.
residuals
Pearson's residual in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA.
fitted.values
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
rank
the numeric rank of the fitted linear model.
family
the family object used.
linear.predictors
the linear fit on link scale.
deviance
up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
null.deviance
The deviance for the null model, comparable with deviance. The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation.
weights
the working weights, that is the weights in the final iteration of the IWLS fit.
prior.weights
the weights initially supplied, a vector of 1s if none were.
df.residual
the residual degrees of freedom.
df.null
the residual degrees of freedom for the null model.
y
if requested (the default) the y vector used. (It is a vector even for a binomial model.)
x
if requested, the model matrix.
converged
logical. Was the modified IWLS algorithm judged to have converged?
boundary
logical. Is the fitted value on the boundary of the attainable values?
ModifiedScores
the vector of the median modified scores for the parameters at the final iteration.
FisherInfo
the Fisher information matrix evaluated at the resulting estimates. Only available when method = "mbrglm.fit".
FisherInfoInvs
the inverse of Fisher information matrix evaluated at the resulting estimates.
nIter
the number of iterations that were required until convergence. Only available when method = "mbrglm.fit".
model
if requested (the default), the model frame.
formula
the formula supplied.
terms
the terms object used.
data
the data argument.
offset
the offset vector used.
control.mbrglm
the control.mbrglm argument that was passed to mbrglm. Only available when method = "mbrglm.fit".
contrasts
(where relevant) the contrasts used.

Details

mbrglm.fit is the workhorse function for fitting the model using the median bias-reduction method. The main iteration of mbrglm.fit consists to calculate the required quantities for the construction of the modified iterative re-weighted least square which involves the modification term of the score function in the adjusted dependent variable. Iteration is repeated until either the iteration limit has been reached or the Euclidean distance of the median modified scores is less than some specified positive constant (see the mbr.maxit and mbr.epsilon arguments in mbrglm.control).

References

Kenne Pagui, E. C., Salvan, A. and Sartori, N. (2016). Median bias reduction of maximum likelihood estimates. http://arxiv.org/abs/1604.04768.

See Also

brglm, brglm.fit, glm, glm.fit

Examples

Run this code
 ## First example
library(brglm)     
data(endo)
# Fit the GLM using maximum likelihood
endo.glm <- glm(HG~NV+PI+EH,family=binomial,data=endo)
## Mean bias-reduced fit
endo.brglm<-brglm(HG~NV+PI+EH,family=binomial,data=endo)
## Median bias-reduced fit
endo.mbrglm<-mbrglm(HG~NV+PI+EH,family=binomial,data=endo)
endo.glm
endo.brglm
endo.mbrglm

# Now other links
update(endo.mbrglm, family = binomial(probit))
update(endo.mbrglm, family = binomial(cloglog))

##------------------------
## paper by Andrey Gelman et al. 2008. Annals of applied Statistics.
## application to binomial
## example 4.2
##----------------------

# first way

x<-c(-0.86,-0.30,-0.05,0.73)
z.x<- (1/sqrt(4))*(x-mean(x))/sqrt(var(x))
weights<-rep(5,4)
z<-c(0,1,3,5)
y=z/weights
fit.glm<-glm(y~z.x,family=binomial,weights=weights)
fit.brglm<-brglm(y~z.x,family=binomial,weights=weights)
fit.mbrglm<-mbrglm(y~z.x,family=binomial,weights=weights)
fit.glm
fit.brglm
fit.mbrglm

# in alternative
fit.glm<-glm(cbind(z,weights-z)~z.x,family=binomial)
fit.brglm<-brglm(cbind(z,weights-z)~z.x,family=binomial)
fit.mbrglm<-mbrglm(cbind(z,weights-z)~z.x,family=binomial)
fit.glm
fit.brglm
fit.mbrglm

##----------------------------------------
# Rasch model: 100 subjects and 5 items
##----------------------------------------

I <- 5
S <-  100

## function to generate data
gendata.M <- function(gamma, alpha, beta)
{
  I <- length(alpha) 
  S <- length(gamma) 
  data.y <- matrix(0, nrow=S, ncol=I)
  for(i in 1:I)
  {
    mui <- plogis(alpha[i] + gamma * beta[i])
    data.y[,i] <- rbinom(S, size=1, prob=mui)
  }
  return(data.y)      
}

alphas <- c(0.0,  0.7,  1.6,  0.6, -0.5)
betas <- rep(1,I)
gammas <- rnorm(S)

y <- gendata.M(gammas,alphas,betas)

y.dat <- data.frame(y=y[1:(S*I)],subject=factor(rep(1:S,I)),item=factor(rep(1:I,each=S)))

## Not run: ------------------------------------
# fit.glm <- glm(y~subject-1+item,family=binomial,data=y.dat)
# fit.brglm <- brglm(y~subject-1+item,family=binomial,data=y.dat)
# fit.mbrglm <- mbrglm(y~subject-1+item,family=binomial,data=y.dat)
## ---------------------------------------------
summary(fit.glm)
summary(fit.brglm)
summary(fit.mbrglm)

Run the code above in your browser using DataLab