gapsMapRun: `gapsMapRun` calls the C++ MCMC code and performs Bayesian matrix factorization returning the two matrices that reconstruct the data matrix; as opposed to gapsRun, this method takes an additional input specifying set patterns in the P matrix

Description

gapsMapRun calls the C++ MCMC code and performs Bayesian matrix factorization returning the two matrices that reconstruct the data matrix; as opposed to gapsRun, this method takes an additional input specifying set patterns in the P matrix

Usage

gapsMapRun(D, S, FP, ABins = data.frame(), PBins = data.frame(),
  nFactor = 5, simulation_id = "simulation", nEquil = 1000,
  nSample = 1000, nOutR = 1000, output_atomic = "FALSE",
  fixedMatrix = "P", fixedBinProbs = "FALSE", fixedDomain = "N",
  sampleSnapshots = "TRUE", numSnapshots = 100, alphaA = 0.01,
  nMaxA = 1e+05, max_gibbmass_paraA = 100, alphaP = 0.01, nMaxP = 1e+05,
  max_gibbmass_paraP = 100)

Arguments

data matrix

uncertainty matrix (std devs for chi-squared of Log Likelihood)

data.frame with rows giving fixed patterns for P

ABins

a matrix of same size as A which gives relative probability of that element being non-zero

PBins

a matrix of same size as P which gives relative probability of that element being non-zero

nFactor

number of patterns (basis vectors, metagenes), which must be greater than or equal to the number of rows of FP

simulation_id

name to attach to atoms files if created

nEquil

number of iterations for burn-in

nSample

number of iterations for sampling

nOutR

how often to print status into R by iterations

output_atomic

whether to write atom files (large)

fixedMatrix

character indicating whether A or P matrix has fixed columns or rows respectively

fixedBinProbs

Boolean for using relative probabilities given in Abins and Pbins

fixedDomain

character to indicate whether A or P is domain for relative probabilities

sampleSnapshots

Boolean to indicate whether to capture individual samples from Markov chain during sampling

numSnapshots

the number of individual samples to capture

alphaA

sparsity parameter for A domain

nMaxA

PRESENTLY UNUSED, future = limit number of atoms

max_gibbmass_paraA

limit truncated normal to max size

alphaP

sparsity parameter for P domain

nMaxP

PRESENTLY UNUSED, future = limit number of atoms

max_gibbmass_paraP

limit truncated normal to max size

Value

A list containing:
AmeanSampled mean value of the amplitude matrix ${\bf{A}}$.
AsdSampled standard deviation of the amplitude matrix ${\bf{A}}$.
PmeanSampled mean value of the amplitude matrix ${\bf{P}}$.
PsdSampled standard deviation of the amplitude matrix ${\bf{P}}$.
atomsAEquilNumber of atoms in ${\bf{A}}$ during each iteration of the equilibration phase.
atomsASampNumber of atoms in ${\bf{A}}$ during each iteration of the sampling phase.
atomsPEquilNumber of atoms in ${\bf{P}}$ during each iteration of the equilibration phase.
atomsPSampNumber of atoms in ${\bf{P}}$ during each iteration of the sampling phase.
chiSqValuesValue of $chi^2$ at each step during equilibration and sampling.
meanChi2Value of $chi^2$ for Amean and Pmean.
ASnapshotsSamples of A matrices taken during sampling.
PSnapshotsSamples of P matrices taken during sampling.

Details

The decomposition in GAPS is achieved by finding amplitude and pattern matrices (${\bf{A}}$ and ${\bf{P}}$, respectively) for which $${\bf{D}} = {\bf{A}}{\bf{P}} + \Sigma$$, where $\Sigma$ is the matrix of uncertainties given by S. The matrices $\bf{A}$ and $\bf{P}$ are assumed to have the atomic prior described in Sibisi and Skilling (1997) and are found with MCMC sampling. However, some rows of $\bf{P}$ are fixed to be the values specified in the input argument FP after rescaling to have norm 1.

Examples

Run this code

## Load data
data('SimpSim')

## Specify the fixed pattern
mapP <- matrix(0,nrow=2,ncol=20)
mapP[1,1:10] <- 1
mapP[2,11:20] <- 1

## Run the GAPS matrix decomposition
testmap <- gapsMapRun(SimpSim.D, SimpSim.S, FP=mapP, 
                      nFactor=3,nEquil = 1000,nSample = 1000)


## Compare fixed patterns to input patterns (after scaling)
summary(t(testmap$Pmean[2:3,] - sweep(mapP,1,apply(mapP,1,sum),FUN="/")))