# svydesign

##### Survey sample analysis.

Specify a complex survey design.

##### Usage

```
svydesign(ids, probs=NULL, strata = NULL, variables = NULL, fpc=NULL,
data = NULL, nest = FALSE, check.strata = !nest, weights=NULL,...)
## S3 method for class 'imputationList':
svydesign(ids, probs = NULL, strata = NULL, variables = NULL,
fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL,
...)
## S3 method for class 'character':
svydesign(ids, probs = NULL, strata = NULL, variables = NULL,
fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL,
dbtype = "SQLite", dbname, ...)
```

##### Arguments

- ids
- Formula or data frame specifying cluster ids from largest
level to smallest level,
`~0`

or`~1`

is a formula for no clusters. - probs
- Formula or data frame specifying cluster sampling probabilities
- strata
- Formula or vector specifying strata, use
`NULL`

for no strata - variables
- Formula or data frame specifying the variables
measured in the survey. If
`NULL`

, the`data`

argument is used. - fpc
- Finite population correction: see Details below
- weights
- Formula or vector specifying sampling weights as an
alternative to
`prob`

- data
- Data frame to look up variables in the formula
arguments, or database table name, or
`imputationList`

object, see below - nest
- If
`TRUE`

, relabel cluster ids to enforce nesting within strata - check.strata
- If
`TRUE`

, check that clusters are nested in strata - dbtype
- name of database driver to pass to
`dbDriver`

- dbname
- name of database (eg file name for SQLite)
- ...
- for future expansion

##### Details

The `svydesign`

object combines a data frame and all the survey
design information needed to analyse it. These objects are used by
the survey modelling and summary functions. The
`id`

argument is always required, the `strata`

,
`fpc`

, `weights`

and `probs`

arguments are
optional. If these variables are specified they must not have any
missing values.
By default, `svydesign`

assumes that all PSUs, even those in
different strata, have a unique value of the `id`

variable. This allows some data errors to be detected. If your PSUs
reuse the same identifiers across strata then set `nest=TRUE`

.
The finite population correction (fpc) is used to reduce the variance when
a substantial fraction of the total population of interest has been
sampled. It may not be appropriate if the target of inference is the
process generating the data rather than the statistics of a
particular finite population.
The finite population correction can be specified either as the total
population size in each stratum or as the fraction of the total
population that has been sampled. In either case the relevant
population size is the sampling units. That is, sampling 100 units
from a population stratum of size 500 can be specified as 500 or as
100/500=0.2.
If population sizes are specified but not sampling probabilities or
weights, the sampling probabilities will be computed from the
population sizes assuming simple random sampling within strata.
For multistage sampling the `id`

argument should specify a
formula with the cluster identifiers at each stage. If subsequent
stages are stratified `strata`

should also be specified as a
formula with stratum identifiers at each stage. The population size
for each level of sampling should also be specified in `fpc`

.
If `fpc`

is not specified then sampling is assumed to be with
replacement at the top level and only the first stage of cluster is
used in computing variances. If `fpc`

is specified but for fewer
stages than `id`

, sampling is assumed to be complete for
subsequent stages. The variance calculations for
multistage sampling assume simple or stratified random sampling
within clusters at each stage except possibly the last.
The `dim`

, `"["`

, `"[<-"`

and na.action methods for
`survey.design`

objects operate on the dataframe specified by
`variables`

and ensure that the design information is properly
updated to correspond to the new data frame. With the `"[<-"`

method the new value can be a `survey.design`

object instead of a
data frame, but only the data frame is used. See also
`subset.survey.design`

for a simple way to select
subpopulations.

The `model.frame`

method extracts the observed data.

If the strata with one only PSU are not self-representing (or they are,
but `svydesign`

cannot tell based on `fpc`

) then the handling
of these strata for variance computation is determined by
`options("survey.lonely.psu")`

. See `svyCprod`

for
details.

`data`

may be a character string giving the name of a table or view
in a relational database that can be accessed through the `DBI`

or `ODBC`

interfaces. For DBI interfaces `dbtype`

should be the name of the database
driver and `dbname`

should be the name by which the driver identifies
the specific database (eg file name for SQLite). For ODBC databases
`dbtype`

should be `"ODBC"`

and `dbname`

should be the
registed DSN for the database. On the Windows GUI, `dbname=""`

will
produce a dialog box for interactive selection.

The appropriate database interface package must already be loaded (eg
`RSQLite`

for SQLite, `RODBC`

for ODBC). The survey design
object will contain only the design meta-data, and actual variables will
be loaded from the database as needed. Use
`close`

to close the database connection and
`open`

to reopen the connection, eg, after
loading a saved object.

If `data`

is an `imputationList`

object (from the "mitools"
package), `svydesign`

will return a `svyimputationList`

object
containing a set of designs. Use `with.svyimputationList`

to
do analyses on these designs and `MIcombine`

to combine the results.

##### Value

- An object of class
`survey.design`

.

##### See Also

`postStratify`

for post-stratification,
`as.svrepdesign`

for converting to replicate weight designs,
`subset.survey.design`

for domain estimates,
`update.survey.design`

to add variables.

`mitools`

package for using multiple imputations

##### Examples

```
data(api)
# stratified sample
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
# two-stage cluster sample: weights computed from population sizes.
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
## multistage sampling has no effect when fpc is not given, so
## these are equivalent.
dclus2wr<-svydesign(id=~dnum+snum, weights=weights(dclus2), data=apiclus2)
dclus2wr2<-svydesign(id=~dnum, weights=weights(dclus2), data=apiclus2)
## syntax for stratified cluster sample
##(though the data weren't really sampled this way)
svydesign(id=~dnum, strata=~stype, weights=~pw, data=apistrat,
nest=TRUE)
##database example: requires RSQLite
library(RSQLite)
dbclus1<-svydesign(id=~dnum, weights=~pw, fpc=~fpc,
data="apiclus1",dbtype="SQLite", dbname=system.file("api.db",package="survey"))
```

*Documentation reproduced from package survey, version 3.9-1, License: GPL-2 | GPL-3*