survey (version 4.2-1)

# svyby: Survey statistics on subsets

## Description

Compute survey statistics on subsets of a survey defined by factors.

## Usage

```svyby(formula, by ,design,...)
# S3 method for default
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
# S3 method for survey.design2
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, influence=covmat,
na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))# S3 method for svyby
SE(object,...)
# S3 method for svyby
deff(object,...)
# S3 method for svyby
coef(object,...)
# S3 method for svyby
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula,  bys,  design, FUN, ...)```

## Value

An object of class `"svyby"`: a data frame showing the factors and the results of `FUN`.

For `unwtd.count`, the unweighted number of non-missing observations in the data matrix specified by `x` for the design.

## Arguments

formula,x

A formula specifying the variables to pass to `FUN` (or a matrix, data frame, or vector)

by

A formula specifying factors that define subsets, or a list of factors.

design

A `svydesign` or `svrepdesign` object

FUN

A function taking a formula and survey design object as its first two arguments.

...

Other arguments to `FUN`. NOTE: if any of the names of these are partial matches to `formula`,`by`, or `design`, you must specify the `formula`,`by`, or `design` argument by name, not just by position.

deff

Request a design effect from `FUN`

keep.var

If `FUN` returns a `svystat` object, extract standard errors from it

keep.names

Define row names based on the subsets

verbose

If `TRUE`, print a label for each subset as it is processed.

vartype

Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance

drop.empty.groups

If `FALSE`, report `NA` for empty groups, if `TRUE` drop them from the output

na.rm.by

If true, omit groups defined by `NA` values of the `by` variables

.

na.rm.all

If true, check for groups with no non-missing observations for variables defined by `formula` and treat these groups as empty. Doesn't make much sense without `na.rm=TRUE`

covmat

If `TRUE`, compute covariances between estimates for different subsets. Allows `svycontrast` to be used on output. Requires that `FUN` supports either `return.replicates=TRUE` or `influence=TRUE`

return.replicates

Only for replicate-weight designs. If `TRUE`, return all the replicates as the "replicates" attribute of the result

influence

Return the influence functions of the result

multicore

Use `multicore` package to distribute subsets over multiple processors?

stringsAsFactors

Convert any string variables in `formula` to factors before calling `FUN`, so that the factor levels will be the same in all groups (See Note below). Potentially slow.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

df

degrees of freedom for t-distribution in confidence interval, use `degf(design)` for number of PSUs minus number of strata

object

An object of class `"svyby"`

bys

one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

## Details

The variance type "ci" asks for confidence intervals, which are produced by `confint`. In some cases additional options to `FUN` will be needed to produce confidence intervals, for example, `svyquantile` needs `ci=TRUE` or `keep.var=FALSE`.

`unwtd.count` is designed to be passed to `svyby` to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with `multicore=TRUE` is useful only for fairly large problems and on computers with sufficient memory. The `multicore` package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

The variant `svybys` creates a separate table for each term in `bys` rather than creating a joint table.

`svytable` and `ftable.svystat` for contingency tables, `ftable.svyby` for pretty-printing of `svyby`

## Examples

Run this code
``````data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
svyby(~api99, list(school.type=apiclus1\$stype), dclus1, svymean)
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
svyby(~api99, list(school.type=apiclus1\$stype), rclus1, svymean, vartype="cv")
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
svycontrast(mns, c(E = 1, M = -1))

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
svycontrast(tots, quote(E/H))

## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))

## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE,
vartype=c("se","cv","cvpct","var")))
deff(a)
SE(a)
cv(a)
coef(a)
confint(a, df=degf(rclus1))

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)

## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)

``````

Run the code above in your browser using DataCamp Workspace