svydesign(ids, probs=NULL, strata = NULL, variables = NULL, fpc=NULL,
data = NULL, nest = FALSE, check.strata = !nest, weights=NULL)
~0
or ~1
is a formula for no clusters.NULL
for no strataNULL
, the data
argument is
used.prob
TRUE
, relabel cluster ids to enforce nesting, eg
if ids at second level of sampling are reused within first-level
unitsTRUE
, check that clusters are nested in stratasurvey.design
. The svydesign
object combines a data frame and all the survey
design information needed to analyse it. These objects are used by
the survey modelling and summary functions.
The finite population correction is used to reduce the variance when a substantial fraction of the total population of interest has been sampled. It may not be appropriate if the target of inference is the process generating the data rather than the statistics of a particular finite population.
The finite population correction can be specified either as the total
population size in each stratum or as the fraction of the total
population that has been sampled. In either case the relevant
population size is `primary sampling units', the largest clusters.
That is, sampling 100 units from a population stratum of size 500 can
be specified as 100 or as 100/500=0.2. The finite population
correction can be specified by a vector with one element for each
individual (in which case it is an error for it to vary within a
stratum) or as a data frame with one row per stratum. The first
column of the data frame should be a factor with the same levels as
strata
and the second column the finite population correction.
If population sizes are specified but not sampling probabilities or
weights, the sampling probabilities will be computed from the
population sizes assuming simple random sampling within strata.
The dim
, "["
, "[<-"
and na.action methods for
survey.design
objects operate on the dataframe specified by
variables
and ensure that the design information is properly
updated to correspond to the new data frame. With the "[<-"
method the new value can be a survey.design
object instead of a
data frame, but only the data frame is used. See also
subset.survey.design
for a simple way to select
subpopulations.
svydesign
will attempt to determine whether strata with only one
cluster (PSU) are self-representing ("certainty") PSUs (based on
sampling fraction of 1 or on a sampling weight of 1 when other sampling
weights are all greater than 1). If the strata with one only PSU are not
self-representing (or they are, but svydesign
cannot tell) then
the handling of these strata for variance computation is determined by
options("survey.lonely.psu")
. See svyCprod
for details.
If you have strata nested within clusters (eg within-region stratification on
proportion Hispanic in the NHANES survey) use
postStratify
to add these as post-stratification variables.
svyglm
, svymean
, svyvar
, svytable
, svyquantile
,
subset.survey.design,
update.survey.design
data(api)
# stratified sample
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
# two-stage cluster sample
dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
## syntax for stratified cluster sample
##(though the data weren't really sampled this way)
svydesign(id=~dnum, strata=~stype, weights=~pw, data=apistrat, nest=TRUE)
Run the code above in your browser using DataLab