
Last chance! 50% off unlimited learning
Sale ends in
Fits a linear model that uses weights and variance estimates appropriate for the edsurvey.data.frame
.
lm.sdf(formula, data, weightVar = NULL, relevels = list(),
varMethod = c("jackknife", "Taylor"), jrrIMax = 1,
schoolMergeVarStudent = NULL, schoolMergeVarSchool = NULL,
omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL)
a formula
for the
linear model. See lm
.
If y is left blank, the default subject scale or subscale variable
will be used. (You can find the default using
showPlausibleValues
.)
If y is a variable for a subject scale or subscale (one of the
names shown by showPlausibleValues
),
then that subject scale or subscale is used.
an edsurvey.data.frame
.
character indicating the weight variable to use (see Details).
The weightVar
must be one of the weights for the
edsurvey.data.frame
. If NULL
, uses the default
for the edsurvey.data.frame
.
a list. Used when the user wants to change the contrasts from the default treatment contrasts to treatment contrasts with a chosen omitted group. To do this, the user puts an element on the list named the same name as a variable to change contrasts on and then makes the value for that list element equal to the value that should be the omitted group. (See Examples.)
A character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. (See Details.)
when using the jackknife variance estimation method, the jrrIMax
. When
jrrIMax
is set to Inf
, all of the plausible values will be used.
Higher values of jrrIMax
lead to longer computing times and more
accurate variance estimates.
a character variable name from the student file used to
merge student and school data files. Set to NULL
by default.
a character variable name name from the school file used
to merge student and school data files. Set to NULL
by default.
a logical value. When set to the default value of TRUE
, drops
those levels of all factor variables that are specified
in edsurvey.data.frame
. Use print
on an
edsurvey.data.frame
to see the omitted levels.
a logical value. When set to the default value of TRUE
, uses
the default conditions stored in edsurvey.data.frame
to subset the data. Use print
on an
edsurvey.data.frame
to see the default conditions.
a list of lists to recode variables. Defaults to NULL
. Can be set as
recode = list(var1= list(from=c("a,"b","c"), to ="d")). (See examples.)
An edsurvey.lm
with elements:
The function call.
The formula used to fit the model.
The estimates of the coefficients.
The standard error estimates of the coefficients.
The estimated variance due to uncertainty in the scores (plausible values variables).
The estimated variance due to sampling.
The number of plausible values.
The variance estimates under the various plausible values.
The values of the coefficients under the various plausible values.
The coefficient matrix (typically produced by the summary of a model).
The coefficient of determination.
The name of the weight variable.
Number of plausible values.
The number of jackknife replicates used. Set to NA when Taylor series variance estimtes are used.
One of “Taylor series” or “jackknife.”
This function implements an estimator that correctly handles left hand
side variables that are either numeric or plausible values, allows for survey
sampling weights and estimates variances using the jackknife replication method.
The Statistics vignette describes estimation of the reported statistics.
(Run vignette("statistics", package="EdSurvey")
at the R prompt to see the vignette.)
Regardless of the variance estimation, the coefficients are estimated using the sample weights according to the section titled “estimation of weighted means when plausible values are not present.” or the section titled “estimation of weighted means when plausible values are present.” depending on if there are assessment variables or variables with plausible values in them.
How the standard errors of the coefficients are estimated depends on the
value of varMethod
and the presence of plausible values (assessment variables),
But, once it is obtained the t statistic
is given by
The coefficient of determination (R-squared value) is similarly estimated by finding the average R-squared using the sample weights for each set of plausible values.
All variance estimation methods are shown in the “Statistics” vignette.
When varMethod
is set to “jackknife” and the predicted
value does not have plausible values, the variance of the coefficients
is estimated according to the section,
“Estimation of standard errors of weighted means when
plausible values are not present, using the jackknife method.”
When plausible values are present and varMethod
is “jackknife,” the
the variance of the coefficients is estimated according to the section
“Estimation of standard errors of weighted means when
plausible values are present, using the jackknife method.”
When plausible values are not present and varMethod
is “Taylor,” the
the variance of the coefficients is estimated according to the section
“Estimation of standard errors of weighted means when plausible
values are not present, using the Taylor series method.”
When plausible values are present and varMethod
is “Taylor,” the
the variance of the coefficients is estimated according to the section
“Estimation of standard errors of weighted means when plausible
values are present, using the Taylor series method.”
Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators From Complex Surveys. International Statistical Review, 51(3): 279--92.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York, NY: Wiley.
Weisberg, S. (1985). Applied Linear Regression (2nd ed.). New York, NY: Wiley.
# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
# By default uses jacknife variance method using replicate weights
lm1 <- lm.sdf(composite ~ dsex + b017451, data=sdf)
lm1
# for more detailed results use summary:
summary(lm1)
# to specify a variance method use varMethod:
lm2 <- lm.sdf(composite ~ dsex + b017451, data=sdf, varMethod="Taylor")
lm2
summary(lm2)
# Use relevel to set a new omitted category.
lm3 <- lm.sdf(composite ~ dsex + b017451, data=sdf, relevels=list(dsex="Female"))
summary(lm3)
# Use recode to change values for specified variables:
lm4 <- lm.sdf(composite ~ dsex + b017451, data=sdf,
recode=list(b017451=list(from=c("Never or hardly ever",
"Once every few weeks",
"About once a week"),
to=c("Infrequently")),
b017451=list(from=c("2 or 3 times a week","Every day"),
to=c("Frequently"))))
# Note: "Infrequently" is the dropped level for the recoded b017451
summary(lm4)
# }
Run the code above in your browser using DataLab