Fit a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.

```
# S3 method for survey.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE,
influence=FALSE)
# S3 method for svyrep.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL,
return.replicates=FALSE, na.action,multicore=getOption("survey.multicore"))
# S3 method for svyglm
summary(object, correlation = FALSE, df.resid=NULL,
...)
# S3 method for svyglm
predict(object,newdata=NULL,total=NULL,
type=c("link","response","terms"),
se.fit=(type != "terms"),vcov=FALSE,...)
# S3 method for svrepglm
predict(object,newdata=NULL,total=NULL,
type=c("link","response","terms"),
se.fit=(type != "terms"),vcov=FALSE,
return.replicates=!is.null(object$replicates),...)
```

formula

Model formula

design

Survey design from `svydesign`

or `svrepdesign`

. Must contain all variables
in the formula

subset

Expression to select a subpopulation

family

`family`

object for `glm`

start

Starting values for the coefficients (needed for some uncommon link/family combinations)

rescale

Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use `FALSE`

to not
rescale weights. For replicate-weight designs, use `TRUE`

to
rescale weights to sum to 1, as was the case before version 3.34.

…

Other arguments passed to `glm`

or
`summary.glm`

rho

For replicate BRR designs, to specify the parameter for
Fay's variance method, giving weights of `rho`

and `2-rho`

return.replicates

Return the replicates as the `replicates`

component of the
result? (for `predict`

, only possible if they
were computed in the `svyglm`

fit)

deff

Estimate the design effects

influence

Return influence functions

object

A `svyglm`

object

correlation

Include the correlation matrix of parameters?

na.action

Handling of NAs

multicore

Use the `multicore`

package to distribute
replicates across processors?

df.resid

Optional denominator degrees of freedom for Wald tests

newdata

new data frame for prediction

total

population size when predicting population total

type

linear predictor (`link`

) or response

se.fit

if `TRUE`

, return variances of predictions

vcov

if `TRUE`

and `se=TRUE`

return full
variance-covariance matrix of predictions

`svyglm`

returns an object of class `svyglm`

. The
`predict`

method returns an object of class `svystat`

For binomial and Poisson families use `family=quasibinomial()`

and `family=quasipoisson()`

to avoid a warning about non-integer
numbers of successes. The `quasi' versions of the family objects give
the same point estimates and standard errors and do not give the
warning.

If `df.resid`

is not specified the df for the null model is
computed by `degf`

and the residual df computed by
subtraction. This is recommended by Korn and Graubard and is correct
for PSU-level covariates but is potentially very conservative for
individual-level covariates. To get tests based on a Normal distribution
use `df.resid=Inf`

, and to use number of PSUs-number of strata,
specify `df.resid=degf(design)`

.

Parallel processing with `multicore=TRUE`

is helpful only for
fairly large data sets and on computers with sufficient memory. It may
be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.

`predict`

gives fitted values and sampling variability for specific new
values of covariates. When `newdata`

are the population mean it
gives the regression estimator of the mean, and when `newdata`

are
the population totals and `total`

is specified it gives the
regression estimator of the population total. Regression estimators of
mean and total can also be obtained with `calibrate`

.

Lumley T, Scott A (2017) "Fitting Regression Models to Survey Data" Statistical Science 32: 265-278

`glm`

, which is used to do most of the work.

`regTermTest`

, for multiparameter tests

`calibrate`

, for an alternative way to specify regression
estimators of population totals or means

`svyttest`

for one-sample and two-sample t-tests.

# NOT RUN { data(api) dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2) rstrat<-as.svrepdesign(dstrat) rclus2<-as.svrepdesign(dclus2) summary(svyglm(api00~ell+meals+mobility, design=dstrat)) summary(svyglm(api00~ell+meals+mobility, design=dclus2)) summary(svyglm(api00~ell+meals+mobility, design=rstrat)) summary(svyglm(api00~ell+meals+mobility, design=rclus2)) ## use quasibinomial, quasipoisson to avoid warning messages summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat, family=quasibinomial())) ## Compare regression and ratio estimation of totals api.ratio <- svyratio(~api.stu,~enroll, design=dstrat) pop<-data.frame(enroll=sum(apipop$enroll, na.rm=TRUE)) npop <- nrow(apipop) predict(api.ratio, pop$enroll) ## regression estimator is less efficient api.reg <- svyglm(api.stu~enroll, design=dstrat) predict(api.reg, newdata=pop, total=npop) ## same as calibration estimator svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll))) ## svyglm can also reproduce the ratio estimator api.reg2 <- svyglm(api.stu~enroll-1, design=dstrat, family=quasi(link="identity",var="mu")) predict(api.reg2, newdata=pop, total=npop) ## higher efficiency by modelling variance better api.reg3 <- svyglm(api.stu~enroll-1, design=dstrat, family=quasi(link="identity",var="mu^3")) predict(api.reg3, newdata=pop, total=npop) ## true value sum(apipop$api.stu) # }

Run the code above in your browser using DataCamp Workspace