svyby: Survey statistics on subsets

Description

Compute survey statistics on subsets of a survey defined by factors.

Usage

svyby(formula, by ,design,...)
# S3 method for default
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
# S3 method for survey.design2
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, influence=covmat, 
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
 multicore=getOption("survey.multicore"))
# S3 method for svyby
SE(object,...)
# S3 method for svyby
deff(object,...)
# S3 method for svyby
coef(object,...)
# S3 method for svyby
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula,  bys,  design, FUN, ...)

Value

An object of class "svyby": a data frame showing the factors and the results of FUN.

For unwtd.count, the unweighted number of non-missing observations in the data matrix specified by x for the design.

Arguments

formula,x: A formula specifying the variables to pass to FUN (or a matrix, data frame, or vector)
by: A formula specifying factors that define subsets, or a list of factors.
design: A svydesign or svrepdesign object
FUN: A function taking a formula and survey design object as its first two arguments and returning an object with suitable coef and SE or vcov or confint methods
...: Other arguments to FUN. NOTE: if any of the names of these are partial matches to formula,by, or design, you must specify the formula,by, or design argument by name, not just by position.
deff: Request a design effect from FUN
keep.var: If FUN returns a svystat object, extract standard errors from it
keep.names: Define row names based on the subsets
verbose: If TRUE, print a label for each subset as it is processed.
vartype: Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance
drop.empty.groups: If FALSE, report NA for empty groups, if TRUE drop them from the output
na.rm.by: If true, omit groups defined by NA values of the by variables

na.rm.all: If true, check for groups with no non-missing observations for variables defined by formula and treat these groups as empty. Doesn't make much sense without na.rm=TRUE
covmat: If TRUE, compute covariances between estimates for different subsets. Allows svycontrast to be used on output. Requires that FUN supports either return.replicates=TRUE or influence=TRUE
return.replicates: Only for replicate-weight designs. If TRUE, return all the replicates as the "replicates" attribute of the result
influence: Return the influence functions of the result
multicore: Use multicore package to distribute subsets over multiple processors?
stringsAsFactors: Convert any string variables in formula to factors before calling FUN, so that the factor levels will be the same in all groups (See Note below). Potentially slow.
parm: a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
level: the confidence level required.
df: degrees of freedom for t-distribution in confidence interval, use degf(design) for number of PSUs minus number of strata
object: An object of class "svyby"
bys: one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

Details

The variance type "ci" asks for confidence intervals, which are produced by confint. In some cases additional options to FUN will be needed to produce confidence intervals, for example, svyquantile needs ci=TRUE or keep.var=FALSE.

The results are extracted by calling coef, SE, vcov, and confint on the returned objects, so these need to be defined. The intent is for FUN to return a svystat or svrepstat object, but that isn't required.

unwtd.count is designed to be passed to svyby to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with multicore=TRUE is useful only for fairly large problems and on computers with sufficient memory. Multicore processing is incompatible with some GUIs.

The variant svybys creates a separate table for each term in bys rather than creating a joint table.

Examples

Run this code

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
svyby(~api99, list(school.type=apiclus1$stype), dclus1, svymean)
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
svyby(~api99, list(school.type=apiclus1$stype), rclus1, svymean, vartype="cv")
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
svycontrast(mns, c(E = 1, M = -1))

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
svycontrast(tots, quote(E/H))


## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))


## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE, 
  vartype=c("se","cv","cvpct","var")))
deff(a)
SE(a)
cv(a)
coef(a)
confint(a, df=degf(rclus1))

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)

## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)