Learn R Programming

survey (version 4.5)

svyglm: Survey-weighted generalised linear models.

Description

Fit a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.

Usage

# S3 method for survey.design
svyglm(formula, design, subset=NULL,
    family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE, influence=FALSE, 
    std.errors=c("linearized","Bell-McCaffrey","Bell-McCaffrey-2"),degf=FALSE)
# S3 method for svyrep.design
svyglm(formula, design, subset=NULL,
    family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL,
    return.replicates=FALSE, na.action,multicore=getOption("survey.multicore"))
# S3 method for svyglm
summary(object, correlation = FALSE, df.resid=NULL, ...)
# S3 method for svyglm
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,...)
# S3 method for svrepglm
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,
                         return.replicates=!is.null(object$replicates),...)

Arguments

Value

svyglm returns an object of class svyglm. The

predict method returns an object of class svystat if

se.fit is TRUE, otherwise just a numeric vector

Details

For binomial and Poisson families use family=quasibinomial() and family=quasipoisson() to avoid a warning about non-integer numbers of successes. The `quasi' versions of the family objects give the same point estimates and standard errors and do not give the warning.

If df.resid is not specified the df for the null model is computed by degf and the residual df computed by subtraction. This is recommended by Korn and Graubard (1999) and is correct for PSU-level covariates but is potentially very conservative for individual-level covariates. To get tests based on a Normal distribution use df.resid=Inf, and to use number of PSUs-number of strata, specify df.resid=degf(design).

When std.errors="Bell-McCaffrey(-2)" option is specified for clustered svydesign, Bell-McCaffrey standard errors are produced that adjust for some of the known downward biases of linearized standard errors. Bell and McAffrey (2002) also suggest corrections for the degrees of freedom, which end up being even more conservative than the (# of PSUs)-(# of strata) design degrees of freedom. By default, the computation of these degrees of freedom adjustments is skipped (degf=FALSE) as they require dealing with projection matrices of size nrow(svydesign$variables) by nrow(svydesign$variables). The option std.errors="Bell-McCaffrey" produces a version with unit working residual covariance matrix, and option std.errors="Bell-McCaffrey-2" produces a version with the exchangeable correlation working residual covariance matrix, recommended by Imbens and Kolesar (2016) (typically more conservative). The standard errors themselves are identical between these two options.

Parallel processing with multicore=TRUE is helpful only for fairly large data sets and on computers with sufficient memory. It may be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.

predict gives fitted values and sampling variability for specific new values of covariates. When newdata are the population mean it gives the regression estimator of the mean, and when newdata are the population totals and total is specified it gives the regression estimator of the population total. Regression estimators of mean and total can also be obtained with calibrate.

When the model is not of full rank, so that some coefficients are NA, point predictions will be made by setting those coefficients to zero. Standard error and variance estimates will be NA.

References

Robert M. Bell and Daniel F. McCaffrey (2002). Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples. Survey Methodology 28 (2), 169-181. https://www150.statcan.gc.ca/n1/pub/12-001-x/2002002/article/9058-eng.pdf

David A. Binder (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review: 51(3), 279-292.

Guido W. Imbens and Michal Kolesár (2016). Robust Standard Errors in Small Samples: Some Practical Advice. The Review of Economics and Statistics, 98(4): 701-712

Edward L. Korn and Barry I. Graubard (1999). Analysis of Health Surveys. Wiley Series in Survey Methodology. Wiley: Hoboken, NJ.

Thomas Lumley and Alastair J. Scott (2017). Fitting Regression Models to Survey Data. Statistical Science 32: 265-278.

See Also

glm, which is used to do most of the work.

regTermTest, for multiparameter tests

calibrate, for an alternative way to specify regression estimators of population totals or means

svyttest for one-sample and two-sample t-tests.

Examples

Run this code

  data(api)


  dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
  dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
  rstrat<-as.svrepdesign(dstrat)
  rclus2<-as.svrepdesign(dclus2)

  summary(svyglm(api00~ell+meals+mobility, design=dstrat))
  summary(svyglm(api00~ell+meals+mobility, design=dclus2))
  summary(svyglm(api00~ell+meals+mobility, design=rstrat))
  summary(svyglm(api00~ell+meals+mobility, design=rclus2))

  ## standard errors corrected up
  summary(svyglm(api00~ell+meals+mobility, design=dstrat, std.errors="Bell-McCaffrey"))
  summary(svyglm(api00~ell+meals+mobility, design=dclus2, std.errors="Bell-McCaffrey"))
  ## not applicable to replicate designs

  ## use quasibinomial, quasipoisson to avoid warning messages
  summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat,
        family=quasibinomial()))


  ## Compare regression and ratio estimation of totals
  api.ratio <- svyratio(~api.stu,~enroll, design=dstrat)
  pop<-data.frame(enroll=sum(apipop$enroll, na.rm=TRUE))
  npop <- nrow(apipop)
  predict(api.ratio, pop$enroll)

  ## regression estimator is less efficient
  api.reg <- svyglm(api.stu~enroll, design=dstrat)
  predict(api.reg, newdata=pop, total=npop)
  ## same as calibration estimator
  svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll)))

  ## svyglm can also reproduce the ratio estimator
  api.reg2 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu"))
  predict(api.reg2, newdata=pop, total=npop)

  ## higher efficiency by modelling variance better
  api.reg3 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu^3"))
  predict(api.reg3, newdata=pop, total=npop)
  ## true value
  sum(apipop$api.stu)

 

Run the code above in your browser using DataLab