monomvn(y, pre = TRUE, method = c("plsr", "pcr", "lasso", "lar",
"forward.stagewise", "stepwise", "ridge"), p = 0.9,
ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
obs = FALSE, verb = 0, quiet = TRUE)
matrix
were each row is interpreted as a
random sample from a MVN distribution with missing
values indicated by NA
y
is to be performed. This sorts the columns so that the
number of NA
s is non-decreasing with the column index"plsr"
(plsr<p
is the proportion of the
number of columns to rows in the design matrix before an
alternative regression method
(those above) is performed as if
least-squares regression method
---only meaningful for the "plsr"
or
"pcr"
methods. Large settings can cause the execution to be
slow as it drastically increases the cross-"CV"
(randomized 10-fold cross-validation) is the faster method,
but does not yield a deterministic result and does not applyNA
entry in each column of verb = 0
) keeps quiet, while any positive number causes brief
statement about dimensions of each regression to print to
the screen as it happens. verb = 2
cawarning
s about regressions to be silenced
when TRUE
monomvn
returns an object of class "monomvn"
, which is a
list containing a subset of the components below.y
y
pre = TRUE
this is a vector containing number of
NA
entries in each column of y
pre = TRUE
this is a vector containing the
index of each column in the sorting of the columns of y
obtained by o <- order(na)
"complete"
indicating that no regression was necessaryplsr
or
pcr
regression, or NA
if such a method was
not used. This field is used to record $\lambda$
when lm.ridge
is usedobs = TRUE
this is the obs = TRUE
this is the S.obs
is
usually not positive definitepre = TRUE
then monomvn
first re-arranges the columns
of y
into nondecreasing order with respect to the number of
missing (NA
) entries. Then (at least) the first column should
be completely observed. The mean components and covariances between
the first set of complete columns are obtained through the standard
mean
and cov
routines. Next each successive group of column with the same missingness pattern
is processed in sequence. Suppose a total of j
columns have
been processed this way already. Let y2
represent the non-missing
contingent of the next group of k
columns of y
with identical missingness pattern, and let y1
be the
previously processed j-1
columns of y
containing only the rows
corresponding to each non-NA
entry in y2
. I.e.,
nrow(y1) = nrow(y2)
. Note that y1
contains no
NA
entries since the missing data pattern is monotone.
The k
next entries (indices j:(j+k)
) of the mean vector,
and the j:(j+k)
rows and columns of the covariance matrix are
obtained by multivariate regression of y2
on y1
.
The regression method used depends on the number of rows and columns
in y1
and on the p
parameter. Whenever ncol(y1)
< p*nrow(y1)
least-squares regression is used, otherwise
method = c("pcr", "plsr")
. If ever a least-squares regression
fails due to co-linearity the one of the other method
s is tried.
All method
s require a scheme for estimating the amount of
variability explained by increasing numbers of coefficients (or principal
components) in the model.
Towards this end, the monomvn
uses
CV in all cases except when nrow(y1) <= 10<="" code="">, in which case CV fails and
LOO is used. Whenever
nrow(y1) <= 3<="" code="">
=>pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For 1:ncomp.max
where
is.null(ncomp.max)
it is replaced with=>
ncomp.max <- min(ncomp.max, ncol(y2), nrow(y1)-1)
which is the max allowed by the ncomp
for ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the optimize
function on the GCV
output.
Based on the MLE ncol(y1)+1
regression coefficients (including
intercept) obtained for each of the
columns of y2
, and on the corresponding matrix
of
residual sum of squares, and on the previous j-1
means
and rows/cols of the covariance matrix, the j:(j+k)
entries and
rows/cols can be filled in as described by Little and Rubin, section 7.4.3.
Once every column has been processed the entries of the mean vector, and rows/cols of the covariance matrix are re-arranged into their original order.
Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.
Bjorn-Helge Mevik and Ron Wehrens (2007).
The
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
monomvn
, and its subroutines, was inspired
by code found on the world wide web, written by Daniel Heitjan,
bmonomvn
, em.norm
in the mlest
in the ## standard usage, duplicating the results in
## Little and Rubin, section 7.4.3 -- try adding
## verb=3 argument for a step-by-step breakdown
data(cement.miss)
out <- monomvn(cement.miss)
out
out$mu
out$S
##
## A bigger example, comparing the various methods
##
## generate N=1000 samples from a random MVN
xmuS <- randmvn(1000, 100)
## randomly impose monotone missingness
xmiss <- rmono(xmuS$x)
## plsr
out.plsr <- monomvn(xmiss)
out.plsr
kl.norm(out.plsr$mu, out.plsr$S, xmuS$mu, xmuS$S)
## plcr
out.pcr <- monomvn(xmiss, method="pcr")
kl.norm(out.pcr$mu, out.pcr$S, xmuS$mu, xmuS$S)
## ridge regression
out.ridge <- monomvn(xmiss, method="ridge")
kl.norm(out.ridge$mu, out.ridge$S, xmuS$mu, xmuS$S)
## lasso
out.lasso <- monomvn(xmiss, method="lasso")
kl.norm(out.lasso$mu, out.lasso$S, xmuS$mu, xmuS$S)
## lar
out.lar <- monomvn(xmiss, method="lar")
kl.norm(out.lar$mu, out.lar$S, xmuS$mu, xmuS$S)
## forward.stagewise
out.fs <- monomvn(xmiss, method="forward.stagewise")
kl.norm(out.fs$mu, out.fs$S, xmuS$mu, xmuS$S)
## stepwise
out.step <- monomvn(xmiss, method="stepwise")
kl.norm(out.step$mu, out.step$S, xmuS$mu, xmuS$S)
Run the code above in your browser using DataLab