Doing interactive CS analysis with sMFA (Sparse Multiple Factor Analysis). Should use multiple queries for this analysis.
Either spca
or arrayspc
is used.
# S4 method for matrix,matrix,CSsmfa
CSanalysis(querMat, refMat, type = "Csmfa",
K = 15, para, lambda = 1e-06, sparse.dim = 2, sparse = "penalty",
max.iter = 200, eps.conv = 0.001, which = c(2, 3, 4, 5),
component.plot = NULL, CSrank.queryplot = FALSE, column.interest = NULL,
row.interest = NULL, profile.type = "gene", color.columns = NULL,
gene.highlight = NULL, gene.thresP = 1, gene.thresN = -1,
thresP.col = "blue", thresN.col = "red", grouploadings.labels = NULL,
grouploadings.cutoff = NULL, legend.names = NULL, legend.cols = NULL,
legend.pos = "topright", labels = TRUE, result.available = NULL,
result.available.update = FALSE, plot.type = "device",
basefilename = NULL)
Query matrix (Rows = genes and columns = compounds)
Reference matrix
"CSsmfa"
sMFA Parameters: Number of components.
sMFA Parameters: A vector of length K. All elements should be positive. If sparse="varnum"
, the elements integers.
sMFA Parameters: Quadratic penalty parameter. Default value is 1e-6. If the target dimension of the sparsness is higher than the other dimension (p > n), it is advised to put lambda
to Inf
which uses the arrayspc
algorithm optimized for this case. For the other case, p < n, a zero or positive lambda
is sufficient and will utilize the normal spca
algorithm.
sMFA Parameters: Which dimension should be sparse? 1: Rows, 2: Columns (default) (Note: For Connectivity Scores it is advised to apply sparsity on the compounds/columns)
sMFA Parameters (lambda < Inf
only): If sparse="penalty"
, para
is a vector of 1-norm penalty parameters. If sparse="varnum"
, para
defines the number of sparse loadings to be obtained.
sMFA Parameters: Maximum number of iterations.
sMFA Parameters: Convergence criterion.
Choose one or more plots to draw:
Information Content for Bicluster (Only available for "CSfabia")
Loadings for query compounds
Loadings for Component (Factor/Bicluster) component.plot
Gene Scores for Component (Factor/Bicluster) component.Plot
Connectivity Ranking Scores for Component component.plot
Component component.plot
VS Other Component : Loadings & Genes
Profile plot (see profile.type
)
Group Loadings Plots for all components (see grouploadings.labels
).
Which components (Factor/Bicluster) should be investigated? Can be a vector of multiple (e.g. c(1,3,5)
). If NULL
, you can choose components of interest interactively from query loadings plot.
Logical value deciding if the CS Rank Scores (which=5
) should also be plotted per query (instead of only the weighted mean).
Numeric vector of indices of reference columns which should be in the profiles plots (which=7
). If NULL
, you can interactively select genes on the Compound Loadings plot (which=3
).
Numeric vector of gene indices to be plotted in gene profiles plot (which=7
, profile.type="gene"
). If NULL
, you can interactively select them in the gene scores plot (which=4
).
Type of which=7
plot:
"gene"
: Gene profiles plot of selected genes in row.interest
with the query compounds and those selected in column.interest
ordered first on the x axis. The other compounds are ordered in decreasing CScore.
"cmpd"
: Compound profiles plot of query and selected compounds (column.interest
) and only those genes on the x-axis which beat the thresholds (gene.thresP
, gene.thresN
)
Vector of colors for the query and reference columns (compounds). If NULL
, blue will be used for query and black for reference. Use this option to highlight query columns and reference columns of interest.
Single numeric vector or list of maximum 5 numeric vectors. This highlights gene of interest in gene scores plot (which=4
) up to 5 different colors. (e.g. You can use this to highlight genes you know to be differentially expressed)
Threshold for genes with a high score (which=4
).
Threshold for genes with a low score (which=4
).
Color of genes above gene.thresP
.
Color of genes below gene.thresN
.
This parameter used for the Group Loadings Plots (which=8
). In general this plot will contain the loadings of all factors, grouped and colored by the labels given in this parameter.
If grouploadings.labels!=NULL
:
Provide a vector for all samples (query + ref) containing labels on which the plot will be based on.
If grouploadings.labels=NULL
:
If no labels are provided when choosing which=8
, automatic labels ("Top Samples of Component 1, 2....") will be created. These labels are given to the top grouploadings.cutoff
number of samples based on the absolute values of the loadings.
Plot which=8
can be used to check 2 different situations. Either to check if your provided labels coincide with the discovered structure in the analysis. The other aim is to find new interesting structures (of samples) which strongly appear in one or multiple components. A subsequent step could be to take some strong samples/compounds of these compounds and use them as a new query set in a new CS analysis to check its validity or to find newly connected compounds.
Please note that even when group.loadings.labels!=NULL
, that the labels based on the absolute loadings of all the factors (the top grouploadings.cutoff
) will always be generated and saved in samplefactorlabels
in the extra
slot of the CSresult
object.
This can then later be used for the CSlabelscompare
function to compare them with your true labels.
Parameter used in plot which=8
. See grouploadings.labels=NULL
for more information. If this parameter is not provided, it will be automatically set to 10% of the total number of loadings.
Option to draw a legend of for example colored columns in Compound Loadings plot (which=3
). If NULL
, only "References" will be in the legend.
Colors to be used in legends. If NULL
, only blue for "Queries is used".
Position of the legend in all requested plots, can be "topright"
, "topleft"
, "bottomleft"
, "bottomright"
, "bottom"
, "top"
, "left"
, "right"
, "center"
.
Boolean value (default=TRUE) to use row and/or column text labels in the score plots (which=c(3,4,5,6)
).
You can a previously returned object by CSanalysis
in order to only draw graphs, not recompute the scores.
Logical value. If TRUE
, the CS and GS will be overwritten depending on the new component.plot
choice. This would also delete the p-values if permutation.object
was available.
How should the plots be outputted? "pdf"
to save them in pdf files, device
to draw them in a graphics device (default), sweave
to use them in a sweave or knitr file.
Directory including filename of the graphs if saved in pdf files
An object of the S4 Class CSresult-class
.