Simulates parameters and missing values from a joint posterior distribution under a normal model using Markov chain Monte Carlo.
mcmcNorm(obj, …)
# S3 method for default
mcmcNorm(obj, x = NULL, intercept = TRUE,
starting.values, iter = 1000, multicycle = NULL,
seeds = NULL, prior = "uniform",
prior.df = NULL, prior.sscp = NULL, save.all.series = TRUE,
save.worst.series = FALSE, worst.linear.coef = NULL,
impute.every = NULL, …)
# S3 method for formula
mcmcNorm(formula, data, starting.values,
iter = 1000, multicycle = NULL, seeds = NULL, prior = "uniform",
prior.df = NULL, prior.sscp = NULL, save.all.series = TRUE,
save.worst.series = FALSE, worst.linear.coef = NULL,
impute.every=NULL, …)
# S3 method for norm
mcmcNorm(obj, starting.values = obj$param,
iter = 1000, multicycle = obj$multicycle,
seeds = NULL, prior = obj$prior, prior.df = obj$prior.df,
prior.sscp = obj$prior.sscp,
save.all.series = !(obj$method=="MCMC" & is.null( obj$series.beta )),
save.worst.series = !is.null( obj$worst.linear.coef ),
worst.linear.coef = obj$worst.linear.coef,
impute.every = obj$impute.every, …)
an object used to select a method. It may be y
,
a numeric matrix, vector or data frame containing response variables
to be modeled as normal. Missing values (NA
s)
are allowed. If y
is a data frame, any factors or ordered factors will be
replaced by their internal codes, and a warning will be given.
Alternatively, this first argument may be a formula
as described
below, or an object of class "norm"
resulting from a call to emNorm
or
mcmcNorm
; see DETAILS.
a numeric matrix, vector or data frame of covariates to be
used as predictors for y
. Missing values (NA
's) are
not allowed. If x
is a matrix, it must have the same number
of rows as y
. If x
is a data frame, any factors or
ordered factors are replaced by their internal codes, and a
warning is given. If NULL
, it defaults to x =
rep(1,nrow(y))
, an intercept-only model.
if TRUE
, then a column of 1
's is
appended to x
. Ignored if x = NULL
.
an object of class "formula"
(or one
that can be coerced to that class): a symbolic description of the
model which is provided in lieu of y
and x
. The
details of model specification are given
under DETAILS.
an optional data frame, list or environment (or object
coercible by as.data.frame
to a data frame) containing
the variables in the model. If not found in data
, the variables are
taken from environment(formula)
, typically the environment
from which mcmcNorm
is called.
starting values for the model
parameters. This must be a list with two named components,
beta
and sigma
, which are numeric matrices with correct
dimensions. In most circumstances, the starting
values will be obtained from a prior run of emNorm
or
mcmcNorm
; see DETAILS.
number of iterations to be performed. By default, each
iteration consists of one Imputation or I-step followed by
one Posterior or P-step, but this can be changed by
multicycle
.
number of cycles per iteration, with
NULL
equivalent to multicycle=1
.
Specifying
multicycle=
k for some k>1 instructs
mcmcNorm
to perform the I-step and P-step cycle k
times within each iteration; see DETAILS.
two integers to initialize the random number generator; see DETAILS.
should be "uniform"
, "jeffreys"
,
"ridge"
or "invwish"
. If "ridge"
then
prior.df
must be supplied. If "invwish"
then
prior.df
and prior.sscp
must be
supplied. For more information, see DETAILS.
prior degrees of freedom for a ridge
(prior="ridge"
) or inverted Wishart (prior="invwish"
)
prior.
prior sums of squares and cross-products (SSCP)
matrix for an inverted Wishart prior (prior="invwish"
).
if TRUE
, then the simulated values of all
parameters at all iterations will be saved.
if TRUE
, then the simulated values
of the worst linear function of the parameters will be saved. Under
ordinary circumstances, this function will have been estimated by
emNorm
after the EM algorithm converged.
vector or coefficients that define the worst
linear function of the parameters. Under ordinary circumstances,
these are provided automatically in the result from emNorm
.
how many iterations to perform between
imputations? If impute.every=
k, then the simulated
values for the missing data after every k iterations will be
saved, resulting in floor(iter/impute.every)
multiple
imputations. If NULL
, then no imputations will be saved.
values to be passed to the methods.
a list whose
class
attribute has been set to "norm"
.
This object may be
passed as the first argument in subsequent calls to emNorm
,
mcmcNorm
, impNorm
,
loglikNorm
or logpostNorm
. The
object also carries the original data and specifies the prior
distribution, so that these do not need to be provided again.
To see a summary of
this object, use the generic function summary
,
which passes the object to summary.norm
.
Components of the list may also be directly accessed and examined by the user. Components which may be of interest include:
number of MCMC iterations performed.
a list with elements beta
and sigma
containing the estimated parameters after the final iteration of
MCMC. This may be supplied as starting values to emNorm
or
mcmcNorm
, or as an argument to impNorm
,
loglikNorm
or logpostNorm
.
a numeric vector of length iter
reporting the
logarithm of the observed-data likelihood function at the start of
each iteration.
a numeric vector of length iter
reporting the
logarithm of the observed-data posterior density function at the start of
each iteration.
a time-series object (class "ts"
) which
contains the simulated values of the worst linear function of the
parameters from all iterations. This will be present if the
starting values provided to mcmcNorm
were close enough to the
mode to provide a reliable estimate of the worst linear function.
The dependence in this series tends to be higher than for
other parameters, so examining the dependence by plotting the series
with plot
or its autocorrelation function with
acf
may help the user to judge how quickly the Markov
chain achieves stationarity. For the definition of the worst linear
function, see the manual accompanying the NORM package
in the subdirectory doc
.
a multivariate time-series object (class
"ts"
) which
contains the simulated values of the coefficients beta
from all iterations. This will present if save.all.series=TRUE
.
a multivariate time-series object (class
"ts"
) which
contains the simulated values of the variances and
covariances (elements of the lower triangle of sigma
)
from all iterations. This will be present if save.all.series=TRUE
.
a list containing the multiple imputations. Each
component of this list is a data matrix resembling y
, but
with NA
's replaced by imputed values. The
length of the list depends on the values of iter
and
impute.every
.
logical matrix with ncol(y)
columns
reporting the missingness patterns seen in y
. Each row of
miss.patt
corresponds to a distinct missingness pattern, with
TRUE
indicating that the variable is missing and
FALSE
indicating that the variable is observed.
integer vector of length
nrow(miss.patt)
indicating, for each missingness pattern, the
number of cases or rows of y
having that pattern.
integer vector of length nrow(y)
indicating
the missingness pattern for each
row of y
. Thus is.na( y[i,] )
is the same thing as
miss.patt[ which.patt[i], ]
.
There are three different ways to specify the data and model when
calling mcmcNorm
:
by directly supplying as the initial argument a matrix of
numeric response variables y
, along with an optional
matrix of predictor variables x
;
by supplying a model specification
formula
, along with an optional data frame data
; or
by supplying an object of class
"norm"
, which was produced by an earlier call to
emNorm
or mcmcNorm
.
In the first case, the matrix y
is assumed to have a
multivariate normal
linear regression on x
with coefficients beta
and
covariance matrix sigma
, where
dim(beta)=c(ncol(x),ncol(y))
and
dim(sigma)=c(ncol(y),ncol(y))
. Missing values NA
are allowed in y
but not in x
.
In the second case, formula
is a formula for a (typically
multivariate) linear regression model in the manner expected by
lm
. A formula is given as y ~ model
, where
y
is either a single numeric variable or a matrix of numeric
variables bound together with the function cbind
. The
right-hand side of the formula (everything to the right of ~
) is a
linear predictor, a series of terms separated by operators +
,
:
or *
to specify main effects and
interactions. Factors are allowed on the right-hand side and will
enter the model as contrasts among the levels
. The
intercept term 1
is included by default; to remove the
intercept, use -1
.
In the third case, the initial argument to mcmcNorm
is an
object of class
"norm"
returned by a previous call to emNorm
or mcmcNorm
. The value of the parameters
carried in this object (the estimates from the last iteration of
EM or the simulated values from the last iteration of MCMC) will be
used as the starting values.
The matrix y
is assumed to have a multivariate normal
linear regression on x
with coefficients beta
and
covariance matrix sigma
, where
dim(beta)=c(ncol(x),ncol(y))
and
dim(sigma)=c(ncol(y),ncol(y))
.
Starting values for the parameters must be provided. In most cases
these will be the result of a previous call to emNorm
or
mcmcNorm
. If the starting
values are close to the mode (i.e., if they are the result of an EM
run that converged) then the worst linear function of the
parameters will be saved at each iteration. If the starting
values are the result of a previous run of MCMC, then the new
run will be a continuation of the same Markov chain.
If multicycle=
k for some k>1,
then the length of the saved parameter
series will be reduced by a factor of k, and the serial
correlation in the series will also be reduced. This option is
useful in large problems with many parameters and in slowly
converging problems for which many iterations are needed.
norm2
functions use their own internal random number generator which
is seeded by two integers, for example, seeds=c(123,456)
,
which allows results to be reproduced in the future. If
seeds=NULL
then
the function will seed itself with two random
integers from R. Therefore, results can also be made reproducible by
calling set.seed
beforehand and taking seeds=NULL
.
If prior="invwish"
then an inverted Wishart prior distribution
is applied to sigma
with hyperparameters prior.df
(a
scalar) and prior.sscp
(a symmetric, positive definite matrix
of the same dimension as sigma
). Using the device of imaginary
results, we can interpret prior.sscp/prior.df
as a prior guess
for sigma
, and prior.df
as the prior degrees of
freedom on which this guess is based.
The usual noninformative prior for the normal regression model
(prior="jeffreys"
) is equivalent to the inverted
Wishart density with prior.df
equal to 0 and
prior.sscp
equal to a matrix of 0's.
The ridge prior (prior="ridge"
) is a special case of the
inverted Wishart (Schafer, 1997). The prior
guess for sigma
is a diagonal matrix with diagonal elements
estimated by regressing the observed values in each column of
y
on the corresponding rows of x
. When
prior="ridge"
, the user must supply a value for
prior.df
, which
determines how strongly the estimated correlations are smoothed
toward zero.
If the first argument to mcmcNorm
is an object of class
"norm"
, then the parameter values stored in that object will
automatically be used as starting values.
For details of the MCMC algorithm, see the manual distributed
with the NORM package in the subdirectory doc
.
Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC Press.
For more information about this function and other functions in
the NORM package, see User's Guide for norm2
in the library subdirectory doc
.
# NOT RUN {
## run EM for marijuana data with ridge prior
data(marijuana)
emResult <- emNorm(marijuana, prior="ridge", prior.df=0.5)
## run MCMC for 5,000 iterations starting from the
## posterior mode using the same prior
mcmcResult <- mcmcNorm(emResult, iter=5000)
## summarize and plot worst linear function
summary(mcmcResult)
plot(mcmcResult$series.worst)
acf(mcmcResult$series.worst, lag.max=50)
## generate 25 multiple imputations, taking
## 100 steps between imputations, and look st
## the first imputed dataset
mcmcResult <- mcmcNorm(emResult, iter=2500, impute.every=100)
mcmcResult$imp.list[[1]]
# }
Run the code above in your browser using DataLab