Calculates the percentiles of a numeric variable in an
edsurvey.data.frame
, a light.edsurvey.data.frame
,
or an edsurvey.data.frame.list
.
percentile(
variable,
percentiles,
data,
weightVar = NULL,
jrrIMax = 1,
varMethod = c("jackknife", "Taylor"),
alpha = 0.05,
dropOmittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
returnVarEstInputs = FALSE,
returnNumberOfPSU = FALSE,
pctMethod = c("symmetric", "unbiased", "simple"),
confInt = TRUE,
dofMethod = c("JR", "WS"),
omittedLevels = deprecated()
)
The return type depends on whether the class of the data
argument is an
edsurvey.data.frame
or an edsurvey.data.frame.list
.
The data argument is an edsurvey.data.frame
When the data
argument is an edsurvey.data.frame
,
percentile
returns an S3 object of class percentile
.
This is a data.frame
with typical attributes (names
,
row.names
, and class
) and additional attributes as follows:
number of rows on edsurvey.data.frame
before any conditions were applied
number of observations with valid data and weights larger than zero
number of PSUs used in the calculation
the call used to generate these results
The columns of the data.frame
are as follows:
the percentile of this row
the estimated value of the percentile
the jackknife standard error of the estimated percentile
degrees of freedom
the lower bound of the confidence interval
the upper bound of the confidence interval
the number of units with more extreme results, averaged across plausible values
When the confInt
argument is set to FALSE
, the confidence
intervals are not returned.
The data argument is an edsurvey.data.frame.list
When the data
argument is an edsurvey.data.frame.list
,
percentile
returns an S3 object of class percentileList
.
This is a data.frame with a call
attribute.
The columns in the data.frame
are identical to those in the previous
section, but there also are columns from the edsurvey.data.frame.list
.
a column for each column in the covs
value of the
edsurvey.data.frame.list
.
See Examples.
When returnVarEstInputs
is TRUE
, an attribute
varEstInputs
also is returned that includes the variance estimate
inputs used for calculating covariances with varEstToCov
.
the character name of the variable to percentiles computed, typically a subject scale or subscale
a numeric vector of percentiles in the range of 0 to 100 (inclusive)
an edsurvey.data.frame
or an
edsurvey.data.frame.list
a character indicating the weight variable to use.
a numeric value; when using the jackknife variance estimation method, the default estimation option, jrrIMax=1
, uses the
sampling variance from the first plausible value as the component for sampling variance estimation. The Inf
) will result in all plausible values being used.
Higher values of jrrIMax
lead to longer computing times and more accurate variance estimates.
a character set to jackknife
or Taylor
that indicates the variance estimation method used when
constructing the confidence intervals. The jackknife
variance estimation method is always
used to calculate the standard error.
a numeric value between 0 and 1 indicating the confidence level.
An alpha
value of 0.05 would indicate a 95%
confidence interval and is the default.
a logical value. When set to the default value of
TRUE
, drops those levels of
all factor variables that are specified in
achievementVars
and aggregatBy
.
Use print
on an edsurvey.data.frame
to see the omitted levels.
a logical value. When set to the default value
of TRUE
, uses the default
conditions stored in an edsurvey.data.frame
to subset the data.
Use print
on an edsurvey.data.frame
to see the default conditions.
a list of lists to recode variables. Defaults to
NULL
. Can be set as
recode=
list(var1=
list(from=
c("a",
"b",
"c"),
to=
"d"))
.
a logical value set to TRUE
to return the
inputs to the jackknife and imputation variance
estimates which allows for the computation
of covariances between estimates.
a logical value set to TRUE
to return the number of
primary sampling units (PSUs)
one of “unbiased”, “symmetric”, “simple”; unbiased produces a weighted median unbiased percentile estimate, whereas simple uses a basic formula that matches previously published results. Symmetric uses a more basic formula but requires that the percentile is symetric to multiplying the quantity by negative one.
a Boolean indicating if the confidence interval should be returned
passed to DoFCorrection
as the method
argument
this argument is deprecated. Use dropOmittedLevels
Paul Bailey
Percentiles, their standard errors, and confidence intervals are calculated according to the vignette titled Statistical Methods Used in EdSurvey. The standard errors and confidence intervals are based on separate formulas and assumptions.
The Taylor series variance estimation procedure is not relevant to percentiles because percentiles are not continuously differentiable.
Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. American Statistician, 50, 361--365.
if (FALSE) {
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))
# get the median of the composite
percentile(variable="composite", percentiles=50, data=sdf)
# get several percentiles
percentile(variable="composite", percentiles=c(0,1,25,50,75,99,100), data=sdf)
# build an edsurvey.data.frame.list
sdfA <- subset(sdf, scrpsu %in% c(5,45,56))
sdfB <- subset(sdf, scrpsu %in% c(75,76,78))
sdfC <- subset(sdf, scrpsu %in% 100:200)
sdfD <- subset(sdf, scrpsu %in% 201:300)
sdfl <- edsurvey.data.frame.list(datalist=list(sdfA, sdfB, sdfC, sdfD),
labels=c("A locations",
"B locations",
"C locations",
"D locations"))
# this shows how these datasets will be described:
sdfl$covs
percentile(variable="composite", percentiles=50, data=sdfl)
percentile(variable="composite", percentiles=c(25, 50, 75), data=sdfl)
}
Run the code above in your browser using DataLab