edsurveyTable: Make a table with a edsurvey.data.frame.

Description

edsurveyTable returns a summary table (as a data.frame) that shows the number of students, the percentage of students, and the mean value of the outcome (or left hand side) variable by the predictor (or right hand side) variable(s).

Usage

edsurveyTable(formula, data, weightVar = NULL, jrrIMax = 1,
  pctAggregationLevel = NULL, returnMeans = TRUE, returnSepct = TRUE,
  varMethod = c("jackknife", "Taylor"), drop = FALSE,
  schoolMergeVarStudent = NULL, schoolMergeVarSchool = NULL,
  omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL)

Arguments

formula

object of class formula, potentially with a subject scale or subscale on the left hand side, and “by variable(s)” for tabulation on the right hand side. When the left hand side of the formula is omitted and returnMeans is TRUE, then the default subject scale or subscale is used. You can find the default composite scale and all subscales using the function showPlausibleValues. Note that the order of the right hand side variables affects the output.

data

object of class edsurvey.data.frame (see readNAEP for how to generate an edsurvey.data.frame).

weightVar

character string indicating the weight variable to use. Note that only the name of the weight variable needs to be included here, and any replicate weights will be automatically included. When this argument is NULL, the function uses the default. Use showWeights to find the default.

jrrIMax

integer indicating the maximum number of plausible values to include when calculating the variance term \(V_{jrr}\) (see the Details section of lm.sdf to see the definition of \(V_{jrr}\)), the default is Inf and results in all available plausible values being used in generating \(V_{jrr}\). Setting this to 1 will make code execution faster but less accurate.

pctAggregationLevel

the percentage variable sums up to 100 for the first pctAggregationLevel columns. So, when set to 0, the PCT column adds up to one across the entire sample. When set to 1, the PCT column adds up to one within each level of the first variable on the right hand side of the formula, when set to two, then the percentage adds up to 100 within the interaction of the first and second variable, and so on. See Examples section.

returnMeans

a logical value. Set to TRUE (the default) to get the MEAN and SE(MEAN) columns in the returned table described in the Value section.

returnSepct

set to TRUE (the default) to get the SEPCT column in the returned table described in the Value section.

varMethod

a character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. Note that “Taylor” is supported only for the column SE(MEAN) and “jackknife” is always used for the column SE(PCT).

drop

a logical value. When set to the default value of FALSE, when a single column is returned, it is still represented as a data.frame and is not converted to a vector.

schoolMergeVarStudent

a character variable name from the student file used to merge student and school data files. Set to NULL by default.

schoolMergeVarSchool

a character variable name name from the school file used to merge student and school data files. Set to NULL by default.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

A logical value. When set to the default value of TRUE, uses the default conditions stored in edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1 = list(from = c("a", "b", "c"), to = "c")). See Examples.

Value

A table with the following columns:

RHS levels

One column for each right hand side variable. Each row regards students who are at the levels shown in that row.

N

count of the number of students in the survey in the RHS levels.

WTD_N

the weighted N count of students in the survey in RHS levels.

PCT

the percentage of students at the aggregation level specified by pctAggregationLevel (see Arguments). See the “Statistics” vignette section “Estimation of weighted percentages” and its first subsection “Estimation of weighted percentages when plausible values are not present.”

SE(PCT)

the standard error of the percentage, accounting for the survey sampling methodology. When varMethod is set to “jackknife,” the calculation of this column is described in the “Statistics” vignette section “Estimation of the standard error of weighted percentages when plausible values are not present, using the jackknife method.”

When varMethod is set to “Taylor,” then the calculation of this column is described in “Estimation of the standard error of weighted percentages when plausible values are not present, using the Taylor series method.”

MEAN

The mean assessment score for units in the RHS levels, calculated according to the “Statistics” vignette section “Estimation of weighted means when plausible values are present.”

SE(MEAN)

The standard error of the MEAN column (the mean assessment score for units in the RHS levels), calculated according to the “Statistics” vignette sections “Estimation of standard errors of weighted means when plausible values are present, using the jackknife method” or “Estimation of standard errors of weighted means when plausible values are present, using the Taylor series method,” depending on the value of varMethod.

Details

This method can be used to generate a simple one to n-way table with unweighted and weighted n values and percentages. It also can calculate the average of the subject scale or subscale for students at each level of the cross-tabulation table.

A detailed description of all statistics is given in the “Statistics” vignette, which you can find by entering vignette("statistics", package = "EdSurvey") at the R command prompt.

References

Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators From Complex Surveys. International Statistical Review, 51(3): 279--92.

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York, NY: Wiley.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)

sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# create a table that shows only the break down of dsex
edsurveyTable(composite ~ dsex, data=sdf, returnMeans=FALSE, returnSepct=FALSE)

# create a table with composite scores by dsex
edsurveyTable(composite ~ dsex, data=sdf)

# add a second variable
edsurveyTable(composite ~ dsex + b017451, data=sdf)

# add a second variable, do not omit any levels
edsurveyTable(composite ~ dsex + b017451 + b003501, data=sdf, omittedLevels=FALSE)

# add a second variable, do not omit any levels, change aggregation level
edsurveyTable(composite ~ dsex + b017451 + b003501, data=sdf, omittedLevels=FALSE,
	            pctAggregationLevel=0)

edsurveyTable(composite ~ dsex + b017451 + b003501, data=sdf, omittedLevels=FALSE,
	            pctAggregationLevel=1)

edsurveyTable(composite ~ dsex + b017451 + b003501, data=sdf, omittedLevels=FALSE,
	            pctAggregationLevel=2)

# variance estimation using the Taylor series 
edsurveyTable(composite ~ dsex + b017451 + b003501, data=sdf, varMethod="Taylor")
# }

Run the code above in your browser using DataLab