make_manifest: Make an object that inherits from class "manifest"

Description

This function is intended for users and sets up the left-hand side of the factor analysis model and is a prerequisite for calling make_restrictions and Factanal.

Although it is possible to simply estimate and use the unbiased sample covariance matrix, there are many other ways to estimate a covariance that can be superior, particularly when the traditional maximum likelihood discrepancy function is not chosen in the call to make_restrictions.

In technical terms, make_manifest is the constructor for objects of manifest-class, which houses the sample covariance estimate and some ancillary information in its slots. The three arguments in the signature of the S4 generic function are: x, data, and covmat

Usage

"make_manifest"(covmat, n.obs = NA_integer_, shrink = FALSE)
"make_manifest"(covmat, shrink = FALSE)
"make_manifest"(covmat, n.obs = NA_integer_, shrink = FALSE, sds = NULL)
"make_manifest"(covmat)
# Use the methods above when only the covariance matrix is available
# Use the methods below when the raw data are available (preferable)
"make_manifest"(x, subset, shrink = FALSE, 
bootstrap = 0, how = "default", seed = 12345, wt = NULL, ...)
"make_manifest"(data, subset, shrink = FALSE,
bootstrap = 0, how = "default", seed = 12345, wt = NULL, ...)
"make_manifest"(data, subset, shrink = FALSE,
bootstrap = 0, how = "default", seed = 12345, wt = NULL, ...)
"make_manifest"(x, subset, shrink = FALSE,
bootstrap = 0, how = "default", seed = 12345, wt = NULL, ...)
"make_manifest"(x, data, subset, shrink = FALSE, na.action = "na.pass", 
bootstrap = 0, how = "default", seed = 12345, wt = NULL, ...)

Arguments

a formula, data.frame, nonsquare matrix of observations by variables, or missing. If a formula, then data must be a data.frame and the formula should not have a response. If a data.frame or a matrix of data, then all its columns are used.

data

a data.frame, nonsquare matrix of observations by variables, or missing. If a data.frame and formula is not specified, then all its columns are used and similarly if it is a matrix of data.

covmat

A covariance matrix, a list, an object of CovMcd-class, an object of S4 class "hetcor" from the polycor package, or missing. If a list, it must contain an element named "cov" and may contain the following named elements:

n.obs: the number of observations used in calculating the "cov" element
W: a positive definite matrix to be used as a weight matrix in the ADF discrepancy function. However, the make_restrictions-methods can calculate various weight matrices if the raw data are passed to make_manifest, so this mechanism should only be used if those options are inadequate
sds: a numeric vector of standard deviations to be used if "cov" is really a correlation matrix

n.obs

The number of observations, which is used if covmat is a covariance matrix or if covmat is a list with no element named n.obs. It is possible to obtain maximum likelihood estimates without knowing the number of observations but nothing else

shrink

A logical indicating whether to use a “shrinkage” estimator of the covariance matrix. If TRUE, then the “minimax shrinkage” estimator discussed in theorem 3.1 of Dey and Srinivasan (1985) is invoked on the sample covariance matrix as calculated according to the other arguments. In some circumstances, shrink is inappropriate and ignored with a warning

sds

Either NULL or a numeric vector that contains the standard deviations of the manifest variables, which is used when covmat is a correlation matrix

subset

A specification of the cases to be used

bootstrap

A nonnegative integer (defaulting to zero) indicating how many bootstraps to do when estimating the uncertainty of the sample covariance estimates.

how

A character string indicating how the covariance matrix should be estimated; see the Details section

seed

A vector of length at most one to be used as the random number generator seed if how = "mcd" or bootstrap > 0. If NULL, then the current seed is used. This argument defaults to 12345.

An optional numeric vector of weights that is the same length as the number of observations that indicates the weight for each observation when x is specified. By default, the observations are weighted equally. The wt argument can be used in two ways. First, it is passed to the the corresponding argument of cov.wt if appropriate (see below). Second, it is passed to the prob argument of sample when bootstrap > 0.

na.action

The na.action to be used if x is a formula.

...

Further arguments that are passed to downstream functions when covmat is unspecified, implying that the raw data are being used to estimate the sample covariance.

Value

An object that inherits from manifest-class.

Details

The rules governing the calculation of the sample covariance matrix are as follows and primarily depend on whether any of the manifest variables are ordered factors. First, consider the case where all manifest variables are numeric. If any of these manifest variables contain missing values, then the covariance matrix is estimated via maximum likelihood under multivariate normality assumptions but requires the suggested mvnmle package. Otherwise, the how argument dictates how the covariance matrix is estimated. There is much to be said in favor the Minimum Covariance Determinant (CovMcd) estimator (see Pison et. al. 2003) and it is used as the default when there are no missing data, although it can subtly affect the sampling distributions of estimates that subsequently derived from it. The same could probably be said for the shrinkage estimators (either via how = "lambda" or shrink = TRUE). The Dey and Srinivasan (1985) shrinkage estimator preserves the eigenvectors of the preliminarily-calculated covariance matrix but deterministically compresses the eigenvalues. The cov.shrink estimator in the corpcor package is based on the idea that the amount shrinkage should be proportional to the variance of the covariance estimates. Use how = "mle" or how = "unbiased" to obtain either the maximum likelihood or unbiased sample covariance estimator, the latter of which is the one used in virtually all factor applications whether appropriate or not.

Next, consider the case where at least one manifest variable is an ordered factor. If how = "ranks", Spearman correlations are estimated from the integer codes underlying the ordered factors. This mechanism is recommended only if there are at least five levels of each ordered factor and no missing data. In that case, one would presumably want to specify method = "ADF" in the subsequent call to make_restrictions). If how != "ranks" all pairwise correlations are estimated under bivariate normality assumptions via hetcor in the suggested polycor package, which will allow pairwise-deletion when there are missing data. If how != "ranks" and bootstrap > 0 (recommended), then there must not be any missing data because the bootstrapping utilizes fast Spearman correlations and then tries to correct the bias by rescaling the bootstrapped means to equal to point estimates calculated with the call to hetcor.

In general, bootstrapping is good for estimating the uncertainty of the estimated sample covariances and this uncertainty estimate is needed for the ADF discrepancy function and its special cases. In some cases, bootstrapping is the only way to obtain such an uncertainty estimate.

References

Dey, D. K. and Srinivasan K. (1985) Estimation of a covariance matrix under Stein's loss. The Annals of Statistics, 13, 1581--1591.

Pison, G., Rousseeuw, P.J., Filzmoser, P. and Croux, C. (2003) Robust factor analysis. Journal of Multivariate Analysis, 84, 145--172.

Examples

Run this code

man <- make_manifest(covmat = Harman23.cor)
show(man)      # some basic info
if(require(nFactors)) screeplot(man) # advanced Scree plot
cormat(man)    # sample correlation matrix

Run the code above in your browser using DataLab

Get 50% off unlimited learning