lm.sdf: Run a linear model on an edsurvey.data.frame.

Description

Fits a linear model that uses weights and variance estimates appropriate for the edsurvey.data.frame.

Usage

lm.sdf(formula, data, weightVar = NULL, relevels = list(),
  varMethod = c("jackknife", "Taylor"), jrrIMax = 1,
  schoolMergeVarStudent = NULL, schoolMergeVarSchool = NULL,
  omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL)

Arguments

formula

a formula for the linear model. See lm. If y is left blank, the default subject scale or subscale variable will be used. (You can find the default using showPlausibleValues.) If y is a variable for a subject scale or subscale (one of the names shown by showPlausibleValues), then that subject scale or subscale is used.

data

an edsurvey.data.frame.

weightVar

character indicating the weight variable to use (see Details). The weightVar must be one of the weights for the edsurvey.data.frame. If NULL, uses the default for the edsurvey.data.frame.

relevels

a list. Used when the user wants to change the contrasts from the default treatment contrasts to treatment contrasts with a chosen omitted group. To do this, the user puts an element on the list named the same name as a variable to change contrasts on and then makes the value for that list element equal to the value that should be the omitted group. (See Examples.)

varMethod

A character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. (See Details.)

jrrIMax

when using the jackknife variance estimation method, the $V_{jrr}$ term (see Details) can be estimated with any positive number of plausible values and is estimated on the first of the lower of the number of available plausible values and jrrIMax. When jrrIMax is set to Inf, all of the plausible values will be used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

schoolMergeVarStudent

a character variable name from the student file used to merge student and school data files. Set to NULL by default.

schoolMergeVarSchool

a character variable name name from the school file used to merge student and school data files. Set to NULL by default.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1= list(from=c("a,"b","c"), to ="d")). (See examples.)

Value

An edsurvey.lm with elements:

call

The function call.

formula

The formula used to fit the model.

coef

The estimates of the coefficients.

The standard error estimates of the coefficients.

Vimp

The estimated variance due to uncertainty in the scores (plausible values variables).

Vjrr

The estimated variance due to sampling.

The number of plausible values.

varm

The variance estimates under the various plausible values.

coefm

The values of the coefficients under the various plausible values.

coefmat

The coefficient matrix (typically produced by the summary of a model).

r.squared

The coefficient of determination.

weight

The name of the weight variable.

npv

Number of plausible values.

njk

The number of jackknife replicates used. Set to NA when Taylor series variance estimtes are used.

varMethod

One of “Taylor series” or “jackknife.”

Details

This function implements an estimator that correctly handles left hand side variables that are either numeric or plausible values, allows for survey sampling weights and estimates variances using the jackknife replication method. The Statistics vignette describes estimation of the reported statistics. (Run vignette("statistics", package="EdSurvey") at the R prompt to see the vignette.)

Regardless of the variance estimation, the coefficients are estimated using the sample weights according to the section titled “estimation of weighted means when plausible values are not present.” or the section titled “estimation of weighted means when plausible values are present.” depending on if there are assessment variables or variables with plausible values in them.

How the standard errors of the coefficients are estimated depends on the value of varMethod and the presence of plausible values (assessment variables), But, once it is obtained the t statistic is given by $$t=\frac{\hat{\beta}}{\sqrt{\mathrm{var}(\hat{\beta})}}$$ where $ \hat{\beta} $ is the estimated coefficient and $\mathrm{var}(\hat{\beta})$ is its variance of that estimate. The p-value associated with the coefficient is then calculated using the number of jackknife replicates as the degrees of freedom.

The coefficient of determination (R-squared value) is similarly estimated by finding the average R-squared using the sample weights for each set of plausible values.

Variance estimation of coefficients

All variance estimation methods are shown in the “Statistics” vignette.

When varMethod is set to “jackknife” and the predicted value does not have plausible values, the variance of the coefficients is estimated according to the section, “Estimation of standard errors of weighted means when plausible values are not present, using the jackknife method.”

When plausible values are present and varMethod is “jackknife,” the the variance of the coefficients is estimated according to the section “Estimation of standard errors of weighted means when plausible values are present, using the jackknife method.”

When plausible values are not present and varMethod is “Taylor,” the the variance of the coefficients is estimated according to the section “Estimation of standard errors of weighted means when plausible values are not present, using the Taylor series method.”

When plausible values are present and varMethod is “Taylor,” the the variance of the coefficients is estimated according to the section “Estimation of standard errors of weighted means when plausible values are present, using the Taylor series method.”

References

Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators From Complex Surveys. International Statistical Review, 51(3): 279--92.

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York, NY: Wiley.

Weisberg, S. (1985). Applied Linear Regression (2nd ed.). New York, NY: Wiley.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# By default uses jacknife variance method using replicate weights
lm1 <- lm.sdf(composite ~ dsex + b017451, data=sdf)
lm1

# for more detailed results use summary:
summary(lm1)

# to specify a variance method use varMethod:
lm2 <- lm.sdf(composite ~ dsex + b017451, data=sdf, varMethod="Taylor")
lm2
summary(lm2)

# Use relevel to set a new omitted category.
lm3 <- lm.sdf(composite ~ dsex + b017451, data=sdf, relevels=list(dsex="Female"))
summary(lm3)

# Use recode to change values for specified variables:
lm4 <- lm.sdf(composite ~ dsex + b017451, data=sdf,
              recode=list(b017451=list(from=c("Never or hardly ever",
                                              "Once every few weeks",
                                              "About once a week"),
                                       to=c("Infrequently")),
                          b017451=list(from=c("2 or 3 times a week","Every day"),
                                       to=c("Frequently"))))
# Note: "Infrequently" is the dropped level for the recoded b017451
summary(lm4)

# }

Run the code above in your browser using DataLab