data(NHANES)
For NHANES datasets, the use of sampling weights and sample design variables is recommended for all analyses because the sample design is a clustered design and incorporates differential probabilities of selection. If you fail to account for the sampling parameters, you may obtain biased estimates and overstate significance levels.
NHANES
and NHANESraw
each include 75 variables available for the 2009-2010 and 2011-2012 sample years.
NHANESraw
has 20,293 observations of these variables plus four additional
variables that describe that sample weighting scheme employed.
NHANES
contains 10,000 rows of data resampled from
NHANESraw
to undo these oversampling effects.
NHANES
can be treated, for educational purposes,
as if it were a simple random sample from the American population.
A list of the variables in the data set follows appears below along with variable descriptions and links to the original NHANES documentation.
# Due to the sampling design, some races were over/under-sampled.
rbind(
NHANES = table(NHANES$Race1) / nrow(NHANES),
NHANESraw = table(NHANESraw$Race1) / nrow(NHANESraw),
diff = (table(NHANES$Race1) - table(NHANESraw$Race1)) / nrow(NHANESraw)
)
Run the code above in your browser using DataLab