mxComputeLoadData: Load columns into an MxData object

Description

THIS INTERFACE IS EXPERIMENTAL AND SUBJECT TO CHANGE.

Usage

mxComputeLoadData(
  dest,
  column,
  method = c("csv", "data.frame"),
  ...,
  path = c(),
  originalDataIsIndexOne = FALSE,
  byrow = TRUE,
  row.names = c(),
  col.names = c(),
  skip.rows = 0,
  skip.cols = 0,
  verbose = 0L,
  cacheSize = 100L,
  checkpointMetadata = TRUE,
  na.strings = c("NA"),
  observed = NULL
)

Arguments

dest

the name of the model where the columns will be loaded

column

a character vector. The column names to replace.

method

name of the conduit used to load the columns.

...

Not used. Forces remaining arguments to be specified by name.

path

the path to the file containing the data

originalDataIsIndexOne

logical. Whether to use the initial data for index 1

byrow

logical. Whether the data columns are stored in rows.

row.names

optional integer. Column containing the row names.

col.names

optional integer. Row containing the column names.

skip.rows

integer. Number of rows to skip before reading data.

skip.cols

integer. Number of columns to skip before reading data.

verbose

integer. Level of run-time diagnostic output. Set to zero to disable

cacheSize

integer. How many columns to cache per scan through the data. Only used when byrow=FALSE.

checkpointMetadata

logical. Whether to add per record metadata to the checkpoint

na.strings

character vector. A vector of strings that denote a missing value.

observed

data frame. The reservoir of data for method='data.frame'.

Details

The purpose of this compute step is to help quickly perform many similar analyses. For example, if we are given a sample of people with a few million SNPs (single-nucleotide polymorphism) per person then we could fit a separate model for each SNP by iterating over the SNP data.

The column names given in the column parameter must already exist in the model's MxData object. Pre-existing data is assumed to be a placeholder and is not used unless originalDataIsIndexOne is set to TRUE.

For method='csv', the highest performance arrangement is byrow=TRUE because entire columns are stored in single chunks (rows) on the disk and can be easily loaded. For byrow=FALSE, the data requires transposition. To load a single column of observed data, it is necessary to read through the whole file. This can be slow for large files. To amortize the cost of transposition, cacheSize columns are loaded on every pass through the file.

After mxRun returns, the dest mxData object will contain the most recently loaded data. Hence, any single analysis of a series can be reproduced by issuing mxComputeLoadData with the single index associated with a particular dataset, replacing the compute plan with something like omxDefaultComputePlan, and then passing the model back through mxRun. This can be a helpful approach when investigating unexpected results.

Description

Usage

Arguments

Details

See Also