svymle: Maximum pseudolikelihood estimation in complex surveys

Description

Fits a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.

Usage

svymle(loglike, gradient = NULL, design, formulas, start = NULL, control
= list(maxit=1000), na.action="na.fail", method=NULL, ...)
## S3 method for class 'svymle':
summary(object, stderr=c("robust", "model"),...)

Arguments

loglike

vectorised loglikelihood function

gradient

Derivative of loglike. Required for variance computation and helpful for fitting

design

a survey.design object

formulas

A list of formulas specifying the variable and linear predictors: see Details below

start

Starting values for parameters

control

control options for optim

na.action

Handling of NAs

method

"nlm" to use nlm, otherwise passed to optim

...

Arguments to loglike and gradient that are not to be optimised over.

object

svymle object

stderr

Choice of standard error estimator. The default is a standard sandwich estimator. See Details below.

Value

An object of class svymle

Details

Optimization is done by nlm by default or if method=="nlm". Otherwise optim is used and method specifies the method and control specifies control parameters.

The design object contains all the data and design information from the survey, so all the formulas refer to variables in this object. The formulas argument needs to specify the response variable and a linear predictor for each freely varying argument of loglike.

Consider for example the dnorm function, with arguments x, mean, sd and log, and suppose we want to estimate the mean of y as a linear function of a variable z, and to estimate a constant standard deviation. The log argument must be fixed at FALSE to get the loglikelihood. A formulas argument would be list(~y, mean=~z, sd=~1). Note that the data variable y must be the first argument to dnorm and the first formula and that all the other formulas are labelled. It is also permitted to have the data variable as the left-hand side of one of the formulas: eg list( mean=y~z, sd=~1).

The usual variance estimator for MLEs in a survey sample is a `sandwich' variance that requires the score vector and the information matrix. It requires only sampling assumptions to be valid (though some model assumptions are required for it to be useful). This is the stderr="robust" option, which is available only when the gradient argument was specified.

If the model is correctly specified and the sampling is at random conditional on variables in the model then standard errors based on just the information matrix will be approximately valid. In particular, for independent sampling where weights and strata depend on variables in the model the stderr="model" should work fairly well.

Examples

Run this code

data(api)

 dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)

 ## fit with glm
 m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
 ## fit as mle (without gradient)
 m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1),start=list(c(80,1,0),c(20)), log=TRUE)
 ## with gradient
 gr<- function(x,mean,sd,log){
	 dm<-2*(x - mean)/(2*sd^2)
	 ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
         cbind(dm,ds)
      }
 m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1),
               start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")

 summary(m0)
 summary(m1,stderr="model")
 summary(m2)

 ## More complicated censored data example
 ## showing that the response variable can be multivariate

 data(pbc, package="survival")
 biasmodel<-glm(I(trt>0)~age*edema,data=pbc)
 pbc$randprob<-fitted(biasmodel)
 dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema, data=subset(pbc,trt>0))

 lcens<-function(x,mean,sd){
    ifelse(x[,2]==1,
           dnorm(log(x[,1]),mean,sd,log=TRUE),
           pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
           )
 }

 gcens<- function(x,mean,sd){

        dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)

        dm<-ifelse(x[,2]==1,
                   2*(log(x[,1]) - mean)/(2*sd^2),
                   dz*-1/sd)
        ds<-ifelse(x[,2]==1,
                   (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
                   ds<- dz*-(log(x[,1])-mean)/(sd*sd))
        cbind(dm,ds)      
 }


svymle(loglike=lcens, gradient=gcens, design=dpbc,
      formulas=list(mean=I(cbind(time,status))~bili+protime+alb,
                    sd=~1),
         start=list(c(10,0,0,0),c(1)))