svydesign: Survey sample analysis.

Description

Specify a complex survey design.

Usage

svydesign(ids, probs, strata = NULL, variables = NULL, data = NULL, 
    nest = FALSE, check.strata = TRUE)

Arguments

ids

Formula or data frame specifying cluster ids from largest level to smallest level, ~0 is a formula for no clusters.

probs

Formula or data frame specifying cluster sampling probabilities

strata

Formula or factor specifying strata, use NULL for no strata

variables

Formula or data frame specifying the variables measured in the survey. If NULL, the data argument is used.

data

Data frame to look up variables in the formula arguments

nest

If TRUE, relabel cluster ids to enforce nesting

check.strata

If TRUE, check that clusters are nested in strata

Value

An object of class survey.design.

Details

When analysing data from a complex survey, observations must be weighted inversely to their sampling probabilities, and the effects of stratification and of correlation induced by cluster sampling must be incorporated in standard errors.

The svydesign object combines a data frame and all the survey design information needed to analyse it. These objects are used by the survey modelling and summary functions.

The dim, "[" and "[<-" and na.action methods for survey.design objects operate on the dataframe specified by variables and ensure that the design information is properly updated to correspond to the new data frame. With the "[<-" method the new value can be a survey.design object instead of a data frame, but only the data frame is used.

References

~put references to the literature/web site here ~

Examples

Run this code

#population
  df<-data.frame(x=rnorm(1000),z=rep(0:4,200))
  df$y<-with(df, 3+3*x*z)
  #sampling fraction
  df$p<-with(df, exp(x)/(1+exp(x)))
  #sample
  xi<-rbinom(1000,1,df$p)
  sdf<-df[xi==1,]
  
  #survey design object: independent sampling, 
  dxi<-svydesign(~0,~p,data=sdf)
 
  dxi
  summary(dxi)
  svymean(sdf$x,dxi)	
  svymean(~x,dxi)
  svytable(~z, dxi)

   #cluster sampling: population
   df$id<-rep(1:250,each=4)
   df$clustp<-by(df,list(df$id),function(d) min(exp(d$x*d$z)/(1+exp(d$x*d$z))))[df$id]
   xi<-rbinom(250,1,df$clustp[4*(1:250)])
   sdf<-df[xi[df$id]==1,]
   
   #cluster sampling design
   dxi<-svydesign(~id,~clustp,data=sdf)
   
   dxi
   summary(dxi)
   svymean(~x+z,dxi)

   ## stratification
   df<-data.frame(z=rep(1:4,each=200), y=rnorm(800, rep(1:4,each=200)))
   xi<-c(sample(1:200,20), sample(201:400,20), sample(401:600,20), sample(601:800,20))
   sdf<-df[xi,]
   stratdx<-svydesign(id=~0,prob=~0,strata=~z,data=sdf)
   unstrat<-svydesign(id=~0,prob=~0,data=sdf)
   stratdx
   unstrat
   summary(stratdx)

   svymean(~y, stratdx)  ##higher precision
   svymean(~y, unstrat)

Run the code above in your browser using DataLab