MCMCmixfactanal: Markov chain Monte Carlo for Mixed Data Factor Analysis Model

Description

This function generates a posterior density sample from a mixed data (both continuous and ordinal) factor analysis model. Normal priors are assumed on the factor loadings and factor scores, improper uniform priors are assumed on the cutpoints, and inverse gamma priors are assumed for the error variances (uniquenesses). The user supplies data and parameters for the prior distributions, and a sample from the posterior density is returned as an mcmc object, which can be subsequently analyzed with functions provided in the coda package.

Usage

MCMCmixfactanal(x, factors, lambda.constraints=list(),
                data=list(), burnin = 1000, mcmc = 10000,
                thin=5, tune=NA, verbose = FALSE, seed = 0,
                lambda.start = NA, psi.start=NA,
                l0=0, L0=0, a0=0.001, b0=0.001,
                store.lambda=TRUE, store.scores=FALSE,
                std.mean=TRUE, std.var=TRUE, ... )

Arguments

A one-sided formula containing the manifest variables. Ordinal (including dichotomous) variables must be coded as ordered factors. NOTE: data input is different in MCMCmixfactanal than in either MCMCfactanal or

factors

The number of factors to be fitted.

lambda.constraints

List of lists specifying possible equality or simple inequality constraints on the factor loadings. A typical entry in the list has one of three forms: varname=list(d,c) which will constrain the dth loading for the variable named

data

A data frame.

burnin

The number of burn-in iterations for the sampler.

mcmc

The number of iterations for the sampler.

thin

The thinning interval used in the simulation. The number of iterations must be divisible by this value.

tune

The tuning parameter for the Metropolis-Hastings sampling. Can be either a scalar or a $k$-vector (where $k$ is the number of manifest variables). tune must be strictly positive.

verbose

A switch which determines whether or not the progress of the sampler is printed to the screen. If TRUE, the iteration number and the Metropolis-Hastings acceptance rate are printed to the screen.

seed

The seed for the random number generator. The code uses the Mersenne Twister, which requires an integer as an input. If nothing is provided, the Scythe default seed is used.

lambda.start

Starting values for the factor loading matrix Lambda. If lambda.start is set to a scalar the starting value for all unconstrained loadings will be set to that scalar. If lambda.start is a matrix of the same dimensions

psi.start

Starting values for the error variance (uniqueness) matrix. If psi.start is set to a scalar then the starting value for all diagonal elements of Psi that represent error variances for continuous variables are set to

The means of the independent Normal prior on the factor loadings. Can be either a scalar or a matrix with the same dimensions as Lambda.

The precisions (inverse variances) of the independent Normal prior on the factor loadings. Can be either a scalar or a matrix with the same dimensions as Lambda.

Controls the shape of the inverse Gamma prior on the uniqueness. The actual shape parameter is set to a0/2. Can be either a scalar or a $k$-vector.

Controls the scale of the inverse Gamma prior on the uniquenesses. The actual scale parameter is set to b0/2. Can be either a scalar or a $k$-vector.

store.lambda

A switch that determines whether or not to store the factor loadings for posterior analysis. By default, the factor loadings are all stored.

store.scores

A switch that determines whether or not to store the factor scores for posterior analysis. NOTE: This takes an enormous amount of memory, so should only be used if the chain is thinned heavily, or for applications with a small num

std.mean

If TRUE (the default) the continuous manifest variables are rescaled to have zero mean.

std.var

If TRUE (the default) the continuous manifest variables are rescaled to have unit variance.

...

further arguments to be passed

Value

An mcmc object that contains the posterior density sample. This object can be summarized by functions provided by the coda package.

Details

The model takes the following form:

Let $i=1,\ldots,N$ index observations and $j=1,\ldots,K$ index response variables within an observation. An observed variable $x_{ij}$ can be either ordinal with a total of $C_j$ categories or continuous. The distribution of $X$ is governed by a $N \times K$ matrix of latent variables $X^*$ and a series of cutpoints $\gamma$. $X^*$ is assumed to be generated according to: $$x^*_i = \Lambda \phi_i + \epsilon_i$$ $$\epsilon_i \sim \mathcal{N}(0,\Psi)$$

where $x^*_i$ is the $k$-vector of latent variables specific to observation $i$, $\Lambda$ is the $k \times d$ matrix of factor loadings, and $\phi_i$ is the $d$-vector of latent factor scores. It is assumed that the first element of $\phi_i$ is equal to 1 for all $i$.

If the $j$th variable is ordinal, the probability that it takes the value $c$ in observation $i$ is:

$$\pi_{ijc} = \Phi(\gamma_{jc} - \Lambda'_j\phi_i) - \Phi(\gamma_{j(c-1)} - \Lambda'_j\phi_i)$$

If the $j$th variable is continuous, it is assumed that $x^*_{ij} = x_{ij}$ for all $i$. The implementation used here assumes independent conjugate priors for each element of $\Lambda$ and each $\phi_i$. More specifically we assume:

$$\Lambda_{ij} \sim \mathcal{N}(l_{0_{ij}}, L_{0_{ij}}^{-1}), i=1,\ldots,k, j=1,\ldots,d$$

$$\phi_{i(2:d)} \sim \mathcal{N}(0, I), i=1,\dots,n$$

MCMCmixfactanal simulates from the posterior density using a Metropolis-Hastings within Gibbs sampling algorithm. The algorithm employed is based on work by Cowles (1996). Note that the first element of $\phi_i$ is a 1. As a result, the first column of $\Lambda$ can be interpretated as negative item difficulty parameters. Further, the first element $\gamma_1$ is normalized to zero, and thus not returned in the mcmc object. The simulation proper is done in compiled C++ code to maximize efficiency. Please consult the coda documentation for a comprehensive list of functions that can be used to analyze the posterior density sample.

References

Kevin M. Quinn. N.D. ``Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses". Typescript Harvard University. M. K. Cowles. 1996. ``Accelerating Monte Carlo Markov Chain Convergence for Cumulative-link Generalized Linear Models." Statistics and Computing. 6: 101-110. Valen E. Johnson and James H. Albert. 1999. ``Ordinal Data Modeling." Springer: New York. Andrew D. Martin, Kevin M. Quinn, and Daniel Pemstein. 2003. Scythe Statistical Library 0.4. http://scythe.wustl.edu. Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/.

Examples

Run this code

data(Cars93)
attach(Cars93)
new.cars <- data.frame(Price, MPG.city, MPG.highway,
                 Cylinders, EngineSize, Horsepower,
                 RPM, Length, Wheelbase, Width, Weight, Origin)
rownames(new.cars) <- paste(Manufacturer, Model)
detach(Cars93)

# drop obs 57 (Mazda RX 7) b/c it has a rotary engine
new.cars <- new.cars[-57,]
# drop 3 cylinder cars
new.cars <- new.cars[new.cars$Cylinders!=3,]
# drop 5 cylinder cars
new.cars <- new.cars[new.cars$Cylinders!=5,]

new.cars$log.Price <- log(new.cars$Price)
new.cars$log.MPG.city <- log(new.cars$MPG.city)
new.cars$log.MPG.highway <- log(new.cars$MPG.highway)
new.cars$log.EngineSize <- log(new.cars$EngineSize)
new.cars$log.Horsepower <- log(new.cars$Horsepower)

new.cars$Cylinders <- ordered(new.cars$Cylinders)
new.cars$Origin    <- ordered(new.cars$Origin)



posterior <- MCMCmixfactanal(~log.Price+log.MPG.city+
                 log.MPG.highway+Cylinders+log.EngineSize+
                 log.Horsepower+RPM+Length+
                 Wheelbase+Width+Weight+Origin, data=new.cars,
                 lambda.constraints=list(log.Horsepower=list(2,"+"),
                 log.Horsepower=c(3,0), weight=list(3,"+")),
                 factors=2,
                 burnin=5000, mcmc=500000, thin=100, verbose=TRUE,
                 L0=.25, tune=1.5)

Run the code above in your browser using DataLab