svyglm: Survey-weighted generalised linear models.

Description

Fit a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.

Usage

# S3 method for survey.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE,
 influence=FALSE)
# S3 method for svyrep.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL,
return.replicates=FALSE, na.action,multicore=getOption("survey.multicore"))
# S3 method for svyglm
summary(object, correlation = FALSE, df.resid=NULL,
...)
# S3 method for svyglm
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,...)
# S3 method for svrepglm
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,
                         return.replicates=!is.null(object$replicates),...)

Arguments

formula

Model formula

design

Survey design from svydesign or svrepdesign. Must contain all variables in the formula

subset

Expression to select a subpopulation

family

family object for glm

start

Starting values for the coefficients (needed for some uncommon link/family combinations)

rescale

Rescaling of weights, to improve numerical stability. The default rescales weights to sum to the sample size. Use FALSE to not rescale weights. For replicate-weight designs, use TRUE to rescale weights to sum to 1, as was the case before version 3.34.

…

Other arguments passed to glm or summary.glm

rho

For replicate BRR designs, to specify the parameter for Fay's variance method, giving weights of rho and 2-rho

return.replicates

Return the replicates as the replicates component of the result? (for predict, only possible if they were computed in the svyglm fit)

deff

Estimate the design effects

influence

Return influence functions

object

A svyglm object

correlation

Include the correlation matrix of parameters?

na.action

Handling of NAs

multicore

Use the multicore package to distribute replicates across processors?

df.resid

Optional denominator degrees of freedom for Wald tests

newdata

new data frame for prediction

total

population size when predicting population total

type

linear predictor (link) or response

se.fit

if TRUE, return variances of predictions

vcov

if TRUE and se=TRUE return full variance-covariance matrix of predictions

Value

svyglm returns an object of class svyglm. The predict method returns an object of class svystat

Details

For binomial and Poisson families use family=quasibinomial() and family=quasipoisson() to avoid a warning about non-integer numbers of successes. The `quasi' versions of the family objects give the same point estimates and standard errors and do not give the warning.

If df.resid is not specified the df for the null model is computed by degf and the residual df computed by subtraction. This is recommended by Korn and Graubard and is correct for PSU-level covariates but is potentially very conservative for individual-level covariates. To get tests based on a Normal distribution use df.resid=Inf, and to use number of PSUs-number of strata, specify df.resid=degf(design).

Parallel processing with multicore=TRUE is helpful only for fairly large data sets and on computers with sufficient memory. It may be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.

predict gives fitted values and sampling variability for specific new values of covariates. When newdata are the population mean it gives the regression estimator of the mean, and when newdata are the population totals and total is specified it gives the regression estimator of the population total. Regression estimators of mean and total can also be obtained with calibrate.

References

Lumley T, Scott A (2017) "Fitting Regression Models to Survey Data" Statistical Science 32: 265-278

Examples

Run this code

# NOT RUN {
  data(api)


  dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
  dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
  rstrat<-as.svrepdesign(dstrat)
  rclus2<-as.svrepdesign(dclus2)

  summary(svyglm(api00~ell+meals+mobility, design=dstrat))
  summary(svyglm(api00~ell+meals+mobility, design=dclus2))
  summary(svyglm(api00~ell+meals+mobility, design=rstrat))
  summary(svyglm(api00~ell+meals+mobility, design=rclus2))

  ## use quasibinomial, quasipoisson to avoid warning messages
  summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat,
        family=quasibinomial()))


  ## Compare regression and ratio estimation of totals
  api.ratio <- svyratio(~api.stu,~enroll, design=dstrat)
  pop<-data.frame(enroll=sum(apipop$enroll, na.rm=TRUE))
  npop <- nrow(apipop)
  predict(api.ratio, pop$enroll)

  ## regression estimator is less efficient
  api.reg <- svyglm(api.stu~enroll, design=dstrat)
  predict(api.reg, newdata=pop, total=npop)
  ## same as calibration estimator
  svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll)))

  ## svyglm can also reproduce the ratio estimator
  api.reg2 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu"))
  predict(api.reg2, newdata=pop, total=npop)

  ## higher efficiency by modelling variance better
  api.reg3 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu^3"))
  predict(api.reg3, newdata=pop, total=npop)
  ## true value
  sum(apipop$api.stu)

 
# }

Run the code above in your browser using DataLab