mxGREMLDataHandler: Helper Function for Structuring GREML Data

Description

This function takes a dataframe or matrix and uses it to setup the 'y' and 'X' matrices for a GREML analysis; this includes trimming out NAs from 'X' and 'y.' The result is a matrix the first column of which is the 'y' vector, and the remaining columns of which constitute 'X.'

Usage

mxGREMLDataHandler(data, yvars=character(0), Xvars=list(), addOnes=TRUE,  blockByPheno=TRUE, staggerZeroes=TRUE)

Arguments

data

Either a dataframe or matrix, with column names, containing the variables to be used as phenotypes and covariates in 'y' and 'X,' respectively.

yvars

Character vector. Each string names a column of the raw dataset, to be used as a phenotype.

Xvars

A list of data column names, specifying the covariates to be used with each phenotype. The list should have the same length as argument yvars.

addOnes

Logical; should lead columns of ones (for the regression intercepts) be adhered to the covariates when assembling the 'X' matrix? Defaults to TRUE.

blockByPheno

Logical; relevant to polyphenotype analyses. If TRUE (default), then the resulting 'y' will contain phenotype #1 for individuals 1 thru n, phenotype #2 for individuals 1 thru n, ... If FALSE, then observations are "blocked by individual", and the resulting 'y' will contain individual #1's scores on phenotypes 1 thru p, individual #2's scores on phenotypes 1 thru p, ... Note that in either case, 'X' will be structured appropriately for 'y.'

staggerZeroes

Logical; relevant to polyphenotype analyses. If TRUE (default), then each phenotype's covariates in 'X' are "staggered," and 'X' is padded out with zeroes. If FALSE, then 'X' is formed simply by stacking the phenotypes' covariates; this requires each phenotype to have the same number of covariates (i.e., each character vector in Xvars must be of the same length). The default (TRUE) is intended for instances where the multiple phenotypes truly are different variables, whereas staggerZeroes=FALSE is intended for instances where the multiple "phenotypes" actually represent multiple observations on the same variable. One example of the latter case is longitudinal data where the multiple "phenotypes" are repeated measures on a single phenotype.

Value

A list with these two components:

Details

For a monophenotype analysis (only), argument Xdata can be a character vector. In a polyphenotype analysis, if the same covariates are to be used with all phenotypes, then Xdata can be a list of length 1.

Note the synergy between the output of mxGREMLDataHandler() and arguments dataset.is.yX and casesToDropFromV to mxExpectationGREML().

If the dataframe or matrix supplied for argument data has n rows, and argument yvars is of length p, then the resulting 'y' and 'X' matrices will have np rows. Then, if either matrix contains any NA's, the rows containing the NA's are trimmed from both 'X' and 'y' before being returned in the output (in which case they will obviously have fewer than np rows). Function mxGREMLDataHandler() reports which rows of the full-size 'X' and 'y' were trimmed out due to missing observations. These row indices can be provided as argument casesToDropFromV to mxExpectationGREML().

References

The OpenMx User's guide can be found at http://openmx.psyc.virginia.edu/documentation.

Examples

Run this code

dat <- cbind(rnorm(100),rep(1,100))
colnames(dat) <- c("y","x")
dat[42,1] <- NA
dat[57,2] <- NA
dat2 <- mxGREMLDataHandler(data=dat, yvars="y", Xvars=list("x"),
  addOnes = FALSE)
str(dat2)

Run the code above in your browser using DataLab