hierarchicalConsensusTOM: Calculation of hierarchical consensus topological overlap matrix

Description

This function calculates consensus topological overlap in a hierarchical manner.

Usage

hierarchicalConsensusTOM(
      # ... information needed to calculate individual TOMs
      multiExpr,
      multiWeights = NULL,
      # Data checking options
      checkMissingData = TRUE,
      # Blocking options
      blocks = NULL,
      maxBlockSize = 20000,
      blockSizePenaltyPower = 5,
      nPreclusteringCenters = NULL,
      randomSeed = 12345,
      # Network construction options
      networkOptions,
      # Save individual TOMs?
      keepIndividualTOMs = TRUE,
      individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
      # ... or information about individual (more precisely, input) TOMs
      individualTOMInfo = NULL,
      # Consensus calculation options 
      consensusTree,
      useBlocks = NULL,
      # Save calibrated TOMs?
      saveCalibratedIndividualTOMs = FALSE,
      calibratedIndividualTOMFilePattern = "calibratedIndividualTOM-Set%s-Block%b.RData",
      # Return options
      saveConsensusTOM = TRUE,
      consensusTOMFilePattern = "consensusTOM-%a-Block%b.RData",
      getCalibrationSamples = FALSE,
      # Return the intermediate results as well?  
      keepIntermediateResults = saveConsensusTOM,
      # Internal handling of TOMs
      useDiskCache = NULL, 
      chunkSize = NULL,
      cacheDir = ".",
      cacheBase = ".blockConsModsCache",
      # Behavior
      collectGarbage = TRUE,
      verbose = 1,
      indent = 0)

Arguments

multiExpr

Expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.

multiWeights

optional observation weights in the same format (and dimensions) as multiExpr. These weights are used for correlation calculations with data in multiExpr.

checkMissingData

Logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.

blocks

Optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of multiExpr giving the number of the block to which the corresponding gene belongs.

maxBlockSize

Integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.

blockSizePenaltyPower

Number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or Inf if not exceeding maximum block size is very important.

nPreclusteringCenters

Number of centers to be used in the preclustering. Defaults to smaller of nGenes/20 and 100*nGenes/maxBlockSize, where nGenes is the nunber of genes (variables) in multiExpr.

randomSeed

Integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.

networkOptions

A single list of class NetworkOptions giving options for network calculation for all of the networks, or a multiData structure containing one such list for each input data set.

keepIndividualTOMs

Logical: should individual TOMs be retained after the calculation is finished?

individualTOMFileNames

Character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.

individualTOMInfo

A list, typically returned by individualTOMs, containing information about the topological overlap matrices in the individual data sets in multiExpr. See the output of individualTOMs for details on the content of the list.

consensusTree

A list specifying the consensus calculation. See details.

useBlocks

Optional vector giving the blocks that should be used for the calcualtions. If NULL, all all blocks will be used.

saveCalibratedIndividualTOMs

Logical: should the calibrated individual TOMs be saved?

calibratedIndividualTOMFilePattern

Specification of file names in which calibrated individual TOMs should be saved.

saveConsensusTOM

Logical: should the consensus TOM be saved to disk?

consensusTOMFilePattern

Character string giving the file names to save consensus TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.

getCalibrationSamples

Logical: should the sampled values used for network calibration be returned?

keepIntermediateResults

Logical: should intermediate consensus TOMs be saved as well?

useDiskCache

Logical: should disk cache be used for consensus calculations? The disk cache can be used to store chunks of calibrated data that are small enough to fit one chunk from each set into memory (blocks may be small enough to fit one block of one set into memory, but not small enough to fit one block from all sets in a consensus calculation into memory at the same time). Using disk cache is slower but lessens the memory footprint of the calculation. As a general guide, if individual data are split into blocks, we recommend setting this argument to TRUE. If this argument is NULL, the function will decide whether to use disk cache based on the number of sets and block sizes.

chunkSize

network similarities are saved in smaller chunks of size chunkSize. If NULL, an appropriate chunk size will be determined from an estimate of available memory. Note that if the chunk size is greater than the memory required for storing intemediate results, disk cache use will automatically be disabled.

cacheDir

character string containing the directory into which cache files should be written. The user should make sure that the filesystem has enough free space to hold the cache files which can get quite large.

cacheBase

character string containing the desired name for the cache files. The actual file names will consists of cacheBase and a suffix to make the file names unique.

collectGarbage

Logical: should garbage be collected after memory-intensive operations?

verbose

integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.

indent

indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Value

A list that contains the output of hierarchicalConsensusCalculation and two extra components:

individualTOMInfo

A copy of the input individualTOMInfo if it was non-NULL, or the result of individualTOMs.

consensusTree

A copy of the input consensusTree.

Details

This function is essentially a wrapper for hierarchicalConsensusCalculation, with a few additional operations specific to calculations of topological overlaps.

Description

Usage

Arguments

Value

Details

See Also