get.coef: Extract Information from a Coarsened-Variable Model

Description

This group of functions will extract various summaries from a model fit by cvam, either with the EM algorithm or by Markov chain Monte Carlo.

Usage

get.coef(obj, withSE = FALSE, meanSeries = TRUE, msgIfNone = TRUE)
get.covMat(obj, msgIfNone = TRUE)
get.estimates(obj, msgIfNone = TRUE)
get.loglik(obj, msgIfNone = TRUE)
get.logP(obj, msgIfNone = TRUE)
get.mfTrue(obj)
get.modelMatrix(obj, msgIfNone = TRUE)
get.offset(obj, mfTrue = FALSE, msgIfNone = TRUE)
get.strZero(obj, mfTrue = FALSE, msgIfNone = TRUE)
get.fitted(obj, type=c("prob", "mean", "logMean"), mfTrue = TRUE, 
   meanSeries = TRUE, msgIfNone = TRUE )
get.imputedFreq(obj, msgIfNone = TRUE)
get.minus2logPSeries(obj, startValShift = TRUE,
   msgIfNone = TRUE, coda = ( obj$method == "MCMC" ) )
get.coefSeries(obj, msgIfNone = TRUE, coda = ( obj$method == "MCMC" ) )
get.probSeries(obj, levelNames=TRUE, sep=".",
msgIfNone = TRUE, coda = ( obj$method == "MCMC" ) )

Value

get.coef returns a vector of estimated coefficients from the log-linear model if withSE=FALSE; if withSE=TRUE, it returns a data frame containing coefficients, standard errors, t-statistics and p-values.

get.covMat returns an estimated covariance matrix for the estimated coefficients.

get.estimates returns a data frame or a list of data frames containing the estimates held in obj.

get.loglik and get.logP return the value of the loglikelihood function or log-posterior density from the beginning of the final iteration of EM or MCMC. If the model was fit using cvam(..., saturated=TRUE), the likelihood is based on a multinomial or product-multinomial distribution over the cells of the complete-data table. If the model was fit as a log-linear approach using cvam(..., saturated=FALSE), the likelihood is based on a surrogate Poisson model.

get.mfTrue returns a data frame with one row per cell of the complete-data table. The variables in this data frame include every factor appearing in the model (the non-coarsened versions) and another variable named freq. If the model was fit using cvam(..., method="EM"), freq contains the predicted cell frequencies at the final iteration of EM. If the model was fit using cvam(..., method="MCMC"), freq contains a running average of imputed cell frequencies over all iterations of MCMC after the burn-in period. In either case, if the data used to fit the model contain no missing or coarsened values, then freq will be equal to the observed frequencies.

get.modelMatrix returns the model matrix for the log-linear model. The rows of the model matrix correspond to the rows of mfTrue, and the columns correspond to terms created from the factors in mfTrue.

get.offset retrieves the offset for the log-linear model. If mfTrue is TRUE, it returns the data frame mfTrue with a numeric variable named offset. If mfTrue is FALSE, it returns a numeric vector of length NROW(mfTrue).

get.strZero retrieves the logical values indicating whether each cell is structural zero. If mfTrue is TRUE, it returns the data frame mfTrue with a logical variable named strZero. If mfTrue is FALSE, it returns a logical vector of length NROW(mfTrue).

get.fitted retrieves fitted values from the log-linear model. If type="prob", the fitted values are cell probabilities conditioned on any variables fixed in the model. If type="mean" or "logMean", the fitted values are cell means or log-cell means from the log-linear model. If mfTrue is TRUE, the function returns the data frame mfTrue with a numeric variable named fit. If mfTrue is FALSE, it returns a numeric vector of length NROW(mfTrue).

get.imputedFreq returns the data frame mfTrue, with the freq variable replaced by multiply imputed versions of the frequencies for the complete-data table.

get.minus2logPSeries returns a series of (minus 2 times) the log-posterior density values from the iterations of MCMC, either as a numeric vector or as an mcmc object used by the coda

package.

get.coefSeries returns a series of log-linear coefficients from the iterations of MCMC, either as a numeric matrix or as an mcmc object used by the coda package.

get.probSeries returns a series of cell probabilities from the iterations of MCMC, either as a numeric matrix or as an mcmc object used by the coda package.

Arguments

obj: an object resulting from a call to cvam with method = "EM" or method = "MCMC"
withSE: if TRUE, then get.coef will return a data frame containing estimated log-linear coefficients, standard errors, t-statistics and p-values; if FALSE, then only a vector of coefficients is given.
mfTrue: if TRUE, then get.offset, get.strZero and get.fitted will return a data frame with one row per cell, with all model variables (the non-coarsened versions) present as factors, and with another variable named (depending on which function was called) offset, strZero or fit containing the requested values. If FALSE, then get.offset, get.strZero and get.fitted will return a vector containing the requested values.
meanSeries: applies when obj is the result from a simulation run. If TRUE, results will be based on from a running average of simulated parameters over all iterations after the burn-in period. If FALSE, results will be based only on the simulated parameter values at the end of the run.
msgIfNone: if TRUE then, if the get procedure fails, an informative message is given explaining why the requested summary cannot be obtained. For example, get.coef will fail to return coefficients from a model fit with cvam(..., saturated = TRUE) because no model matrix is created and the log-linear coefficients are not defined. If FALSE, then these messages are suppressed.
type: type of fitted values to be returned by get.fitted. "prob" returns cell probabilities conditioned on variables fixed by the model (if any); "mean" returns cell means from the log-linear model; and "logMean" returns log-cell means from the log-linear model (i.e., the linear predictor).
startValShift: the function get.minus2logPSeries extracts a saved series from an MCMC run containing the values of (minus 2 times) the log-posterior density function. If startValShift is true, the series is shifted by (minus 2 times) the log-posterior density at the starting value, if the starting value appears to be a mode.
coda: if TRUE, the series from an MCMC run is returned as an mcmc object for plotting and diagnostic analysis with the coda package. If FALSE, a one-dimensional series is returned as a numeric vector, and a multidimensional series is returned as a numeric matrix with rows corresponding to iterations and columns corresponding to elements of the multidimensional quantities being monitored.
levelNames: the get.probSeries function extracts a saved series of probabilities from an MCMC run corresponding to cells of the complete-data table (i.e., the rows of mfTrue). If levelNames is TRUE, names for the cell probabilities are constructed from the levels of the factors in mfTrue. As the number of variables in the model grows, these names can become unwieldly, and setting levelNames to FALSE omits the names.
sep: a character string used to separate the levels of multiple factors when levelNames is TRUE.

Author

Joe Schafer Joseph.L.Schafer@census.gov

Details

The series objects returned by get.minus2logPSeries, get.coefSeries and get.probSeries omit results from the burn-in period, if any, and may also be thinned. The default behavior is no burn-in period and no thinning. The burn-in period and thinning interval are set by components of the control argument to cvam, via the function cvamControl; the relevant components are control$burnMCMC and control$thinMCMC. By default, cvam does not save cell probabilities. To save them, set control$saveProbSeries to TRUE.

get.imputedFreq returns multiple imputations of frequencies for the complete-data table generated and stored during an MCMC run after the burn-in period. The default behavior is no imputation. This can be changed by setting control$imputeEvery to an integer greater than zero.

Other useful information from a model fit can be extracted with the summary method for a cvam object, and with the functions cvamEstimate, cvamPredict, cvamLik, and cvamImpute.

References

For more information, refer to the package vignette Log-Linear Modeling with Missing and Coarsened Values Using the cvam Package.

For information about coda, see:

Martyn Plummer, Nicky Best, Kate Cowles and Karen Vines (2006). CODA: Convergence Diagnosis and Output Analysis for MCMC, R News, vol 6, 7-11.

Examples

Run this code

fit <- cvam( ~ V1 * V2, data=crime, freq=n )
get.coef(fit, withSE=TRUE)
get.covMat(fit)
get.fitted(fit, type="mean")

set.seed(6755)
fit <- cvam(fit, method="MCMC",
   control=list(iterMCMC=5000, imputeEvery=500) )
get.imputedFreq(fit)

if (FALSE) plot( get.coefSeries(fit) )  # coda trace and density plots

Run the code above in your browser using DataLab