Presmoothing: Frequency Distribution Presmoothing

Description

These functions are used to smooth frequency distributions.

Usage

## S3 method for class 'default':
presmoothing(x, smoothmethod = c("none", "average", "bump",
  "loglinear"), jmin, asfreqtab = TRUE, ...)
## S3 method for class 'formula':
presmoothing(x, data, ...)
loglinear(x, scorefun, degrees = list(4, 2, 2), grid,
  rmimpossible = FALSE, asfreqtab = TRUE, models,
  stepup = !missing(models),
  compare = FALSE, verbose = FALSE,
  showWarnings = TRUE, ...)

Arguments

either an object of class freqtab specifying a univariate or multivariate score distribution, or a formula object.

smoothmethod

character string indicating the smoothing method to be used by presmoothing. "none" returns unsmoothed frequencies, "bump" adds a small frequency to each score value, "average" imputes small frequencies

jmin

for smoothmethod = "average", the minimum frequency, as an integer, below which frequencies will be replaced (default is 1). for smoothmethod = "bump", the value to be added to each score point (as a probability, with default 1e-

asfreqtab

logical, with default TRUE, indicating whether or not a frequency table should be returned. For smoothmethod = "average" and smoothmethod = "bump", the alternative is a vector of frequencies. For loglinear

data

an object of class freqtab.

scorefun

matrix of score functions used in loglinear presmoothing, where each column includes a transformation of the score scale or interactions between score scales. If missing, degrees and xdegree will be used to construct polynomial s

degrees

list of integer vectors, each one indicating the maximum polynomial score transformations to be computed for each variable at a given order of interactions. Defaults (degrees = list(4, 2, 2)) are provided for up to trivariate interactions.

grid

matrix with one column per margin in x and one row per term in the model. See below for details.

rmimpossible

logical, with default FALSE, indicating whether or not impossible scores should be removed before smoothing.

models

integer vector indicating which model terms should be grouped together when fitting multiple models.

stepup

logical, with default FALSE, indicating whether or not multiple (nested) models should be fit. If TRUE and models is missing, an attempt will be made to create it using grid and/or degrees.

compare

logical, with default FALSE, indicating whether or not fit for nested models should be compared. If TRUE, stepup is also set to TRUE and only results from the model fit comparison are returned, that is,

verbose

logical, with default FALSE, indicating whether or not full glm output should be returned.

showWarnings

logical, with default TRUE, indicating whether or not warnings from glm should be shown. A warning is common when fitting sparse frequency distributions.

...

further arguments passed to other methods. For presmoothing, these are passed to loglinear and include those listed above.

Value

When smoothmethod = "average" or smoothmethod = "bump", either a smoothed frequency vector or table is returned. Otherwise, loglinear returns the following:
when compare = TRUE, an anova table for model fit
when asfreqtab = TRUE, a smoothed frequency table
when verbose = TRUE, full glm output, for all nested models when stepup = TRUE
when stepup = TRUE and verbose = FALSE, a data.frame of fitted frequencies, with one column per model

Details

Loglinear smoothing is a flexible procedure for reducing irregularities in a frequency distribution prior to equating, where the degree of each polynomial term determines the specific moment of the observed distribution that is preserved in the fitted distribution (see below for examples). The loglinear function is a wrapper for glm, and is used to simplify the creation of polynomial score functions and the fitting and comparing of multiple loglinear models.

scorefun, if supplied, must contain at least one score function of the scale score values. Specifying degrees is an alternative to supplying scorefun. Each list element in degrees should be a vector equal in length to the number of variables contained in x; there should also be one such vector for each possible level of interaction between the variables in x.

For example, the default degrees = list(4, 2, 2) is recycled to produce list(c(4, 4, 4), c(2, 2, 2), c(2, 2, 2)), resulting in polynomials to the fourth power for each univariate distribution, to the second power for each two-way interaction, and to the second power for the three-way interaction.

Terms can also be specified with grid, which is a matrix with each row containing integers specifying the powers for each variable at each interaction term, including main effects. For example, the main effect to the first power for the total score in a bivariate distribution would be c(1, 0); the interaction to the second power would be c(2, 2).

stepup is used to run nested models based on subsets of the columns in scorefun. Output will correspond to models based on columns 1 and 2, 1 through 3, 1 through 4, to 1 through ncol(scorefun). This list of polynomial terms is then used to create a grid using expand.grid. The grid can also be supplied directly, in which case degrees will be ignored.

compare returns output as an anova table, comparing model fit for all the models run with stepup = TRUE.

The remaining smoothing methods make adjustments to scores with low or zero frequencies. smoothmethod = "bump" adds the proportion jmin to each score point and then adjusts the probabilities to sum to 1. smoothmethod = "average" replaces frequencies falling below the minimum jmin with averages of adjacent values.

References

Holland, P. W., and Thayer, D. T. (1987). Notes on the use of log-linear models for fitting discrete probability distributions (PSR Technical Rep. No. 87-79; ETS RR-87-31). Princeton, NJ: ETS.

Holland, P. W., and Thayer, D. T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25, 133--183.

Moses, T. P., and von Davier, A. A. (2006). A SAS macro for loglinear smoothing: Applications and implications (ETS Research Rep. No. RR-06-05). Princeton, NJ: ETS.

Moses, T., and Holland, P. W. (2008). Notes on a general framework for observed score equating (ETS Research Rep. No. RR-08-59). Princeton, NJ: ETS.

Wang, T. (2009). Standard errors of equating for the percentile rank-based equipercentile equating with log-linear presmoothing. Journal of Educational and Behavioral Statistics, 34, 7--23.

Examples

Run this code

set.seed(2010)
x <- round(rnorm(1000, 100, 15))
xscale <- 50:150
xtab <- freqtab(x, scales = xscale)

# Adjust frequencies
plot(xtab, y = cbind(average = freqavg(xtab),
  bump = freqbump(xtab)))

# Smooth x preserving first 3 moments
xlog1 <- loglinear(xtab, degrees = 3,
  asfreqtab = FALSE)
plot(xtab, y = xlog1, legendtext = "degree = 3")

# Add "teeth" and "gaps" to x
# Smooth with formula interface
teeth <- c(.5, rep(c(1, 1, 1, 1, .5), 20))
xttab <- as.freqtab(cbind(xscale, c(xtab) * teeth))
xlog2 <- presmoothing(~ poly(total, 3, raw = TRUE),
  xttab, showWarnings = FALSE)

# Smooth xt using score functions that preserve 
# the teeth structure (also 3 moments)
teeth2 <- c(1, rep(c(0, 0, 0, 0, 1), 20))
xt.fun <- cbind(xscale, xscale^2, xscale^3)
xt.fun <- cbind(xt.fun, teeth2, xt.fun * teeth2)
xlog3 <- loglinear(xttab, xt.fun, showWarnings = FALSE)

# Plot to compare teeth versus no teeth
op <- par(no.readonly = TRUE)
par(mfrow = c(3, 1))
plot(xttab, main = "unsmoothed", ylim = c(0, 30))
plot(xlog2, main = "ignoring teeth", ylim = c(0, 30))
plot(xlog3, main = "preserving teeth", ylim = c(0, 30))
par(op)

# Bivariate example, preserving first 3 moments of total
# and anchor for x and y, and the covariance
# between anchor and total
# see equated scores in Wang (2009), Table 4
xvtab <- freqtab(KBneat$x, scales = list(0:36, 0:12))
yvtab <- freqtab(KBneat$y, scales = list(0:36, 0:12))
Y <- as.data.frame(yvtab)[, 1]
V <- as.data.frame(yvtab)[, 2]
scorefun <- cbind(Y, Y^2, Y^3, V, V^2, V^3, V*Y)
wang09 <- equate(xvtab, yvtab, type = "equip",
  method = "chained", smooth = "loglin",
  scorefun = scorefun)
wang09$concordance

Run the code above in your browser using DataLab