Fits log-linear models to categorical variables by three methods: maximizing the loglikelihood or log-posterior density by Expectation-Maximization (EM) algorithms, simulating the posterior distribution by a Markov chain Monte Carlo (MCMC) algorithms, and creating random draws of parameters from an approximate Bayesian posterior distribution. The factors in the model may have missing or coarsened values.
cvam(obj, ...)# S3 method for formula
cvam(obj, data, freq, weight, subPop,
stratum, cluster, nest = FALSE, prior = cvamPrior(),
method = c("EM", "MCMC", "approxBayes", "mfSeen", "mfTrue",
"mfPrior", "modelMatrix"), control = list(), omitData = FALSE,
saturated = FALSE, modelMatrix = NULL, offset = NULL,
strZero = NULL, startVal = NULL, estimate = NULL, ...)
# S3 method for cvam
cvam(obj, method = obj$method, control = NULL, startVal = NULL,
estimate = NULL, ...)
if method
is "EM"
, "MCMC"
or
"approxBayes"
, an object of class c("cvam","list")
containing the results of a model fit. For other values of
method
, the requested object is returned without fitting a
model.
an object used to select a method: either a model
formula or the result from a previous call to cvam
.
an optional data frame, list or environment (or object
coercible to a data frame by as.data.frame
) containing the variables
in the model. If not found in data
, the variables are taken from
environment(obj)
, typically the environment from which
cvam
is called.
an optional variable for holding integer frequencies when the
observations are grouped. If freq
is not given, then the
observations are assumed to represent microdata, and all frequencies
are set to one.
an optional numeric variable containing survey weights, which
are used when computing pseudo-maximum likelihood (PML) estimates
from survey data. If weight
is given, then the data supplied
are interpreted as microdata, with each row having a frequency of
one.
an optional logical variable indicating membership in a subpopulation for computing PML estimates from survey data.
an optional factor variable indicating the sampling stratum to which a unit belongs, used when computing linearized variance estimates for parameter estimates under a with-replacement (WR) survey design; see DETAILS.
an optional factor variable indicating the primary (first-stage) sampling cluster to which a unit belongs, used when computing linearized variance estimates for parameters under a with-replacement (WR) survey design; see DETAILS.
if TRUE, duplicate values of the cluster variable appearing in different strata are assumed to refer to different clusters.
an object produced by cvamPrior
to
represent prior information incorporated into the model fit.
a procedure for fitting the model:
"EM"
computes a maximum-likelihood (ML) estimate, penalized
ML estimate, posterior mode, or (if survey weights are provided) a
pseudo-maximum likelihood (PML) estimate; "MCMC"
runs a Markov chain
Monte Carlo algorithm to simulate a sequence of correlated random
draws from the posterior distribution of the unknown parameters;
"approxBayes"
creates independent draws from an approximate
posterior distribution. The other alternatives return various
objects without fitting the model.
a named list containing control parameters which are
passed to cvamControl
. Control parameters determine the
maximum number of iterations, criteria for judging convergence,
proposal distributions for MCMC, and so on. Control parameters that are
not found in this list are set to default values.
if TRUE
, then the observations supplied
through data
and freq
are ignored, and the fitted model
is based only the prior information supplied through
prior
. Combining omitData=TRUE
with
method="MCMC"
will simulate random draws from the prior
distribution.
if TRUE
, then a saturated model is fit to
the cell means without defining a model matrix or log-linear
coefficients.
an optional model matrix that defines the
log-linear model. In ordinary circumstances, cvam
creates the
model matrix automatically by interpreting terms in the model
formula and referring to the contrast
attributes of the
model factors. In rare circumstances, a user may want to supply a
different model matrix. The model matrix should have one row for
every cell in the complete-data table. If a model matrix is
supplied, the model formula is used only to identify the variables
that are included the model, not to define the associations among them.
an optional numeric vector of length
NROW(modelMatrix)
containing an offset for the log-linear
model. If omitted, the offset is assumed to be zero for every cell.
an optional logical vector of length
NROW(modelMatrix)
containing
TRUE
for every cell to be considered a structural zero and
FALSE
elsewhere. Structural zeros are assumed to have zero
probability and are omitted from the model fitting. If
strZero
is omitted, all elements are assumed to be FALSE
.
an optional vector of starting values for the model
parameters. If saturated=FALSE
, this should be a vector of
length NCOL(modelMatrix)
containing log-linear coefficients;
if saturated=FALSE
, it should be a vector of length
NROW(modelMatrix)
containing cell probabilities or cell
means, which are automatically rescaled to become probabilities.
an optional formula or list of formulas of the kind
expected by cvamEstimate
specifying marginal or
conditional probabilities to be estimated, bypassing the need for a
subsequent call to that function.
values to be passed to the methods.
Joe Schafer Joseph.L.Schafer@census.gov
A log-linear model is specified by a one-sided formula that determines
which associations among the variables are allowed. For
example, ~ A + B + C
implies that A
, B
and
C
are mutually independent; ~ A*B + A*C
implies that
B
and C
are conditionally independent given A
;
and so on. Variables in a model may be factors or coarsened factors,
and missing values are permitted. All models are fit using a surrogate
Poisson formulation which is appropriate for Poisson, multinomial
or product-multinomial sampling. A formula may contain a vertical bar
to specify variables to be regarded as fixed; for example, ~ A*B
+ A*C | A
fixes the variable A
. Fixing variables does not
change the model fitting procedure; the only difference is that, after
the model has been fit, the cell probabilities are scaled to sum to
one within every combination of levels of the fixed variables.
If cvam
is called with a cvam
object as its first
argument, then the data, model and prior distribution will be
taken from the previous run, and (unless startVal
is
supplied), starting values will be set to the final parameter values
from the previous run.
If method
is "EM"
and survey weights are supplied
through weight
, then cvam
performs pseudo-maximum
likelihood (PML) estimation. The target of PML is the set of
parameters that would be obtained if the given model were fit to all
units in the finite population (or, if subPop
is given, the
subpopulation). If saturated=FALSE
, then
standard errors for log-linear coefficients are computed using a
linearization method that assumes the first stage of sampling within
strata was carried out with replacement (WR). Although WR sampling is
rarely done in actual surveys, it is often assumed for
variance estimation, and if the first-stage sampling was actually done
without replacement the resulting standard errors tend to be
conservative. The WR survey design information is provided through
weight
, stratum
and cluster
. The stratum
and cluster
variables are coerced to factors. If stratum
is omitted, then the population is regarded as a single stratum. If
cluster
is omitted, then each sample unit is treated as a cluster.
Extended descriptions and examples for all major functions are provided in two vignettes, Understanding Coarsened Factors in cvam and Log-Linear Modeling with Missing and Coarsened Values Using the cvam Package.
coarsened
,
cvamPrior
,
cvamControl
,
cvamEstimate
,
get.coef
,
summary.cvam
# convert U.C. Berkeley admissions three-way table to data frame,
# fit model of conditional independence, display summary
# compare the fit to the saturated model
dF <- as.data.frame(UCBAdmissions)
fit <- cvam( ~ Dept*Gender + Dept*Admit, data=dF, freq=Freq )
summary(fit)
fitSat <- cvam( ~ Dept*Gender*Admit, data=dF, freq=Freq )
anova(fit, fitSat, pval=TRUE)
# fit non-independence model to crime data; then run MCMC for
# 5000 iterations, creating 10 multiple imputations of the frequencies
# for the 2x2 complete-data table
fit <- cvam( ~ V1 * V2, data=crime, freq=n )
set.seed(56182)
fitMCMC <- cvam(fit, method="MCMC",
control=list( iterMCMC=5000, imputeEvery=500) )
get.imputedFreq(fitMCMC)
Run the code above in your browser using DataLab