emba (version 0.1.1)

biomarker_mcc_analysis: Biomarker analysis based on MCC model classification

Description

Use this function to perform a full biomarker analysis on an ensemble boolean model dataset where the model classification is based on the Matthews correlation coefficient score (MCC). This analysis enables the discovery of performance biomarkers, nodes whose activity and/or boolean model parameterization (link operator) affects the prediction performance of the models (as measured by the MCC score).

Usage

biomarker_mcc_analysis(model.predictions, models.stable.state,
  models.link.operator = NULL, observed.synergies, threshold,
  num.of.mcc.classes, include.NaN.mcc.class = TRUE)

Arguments

model.predictions

a data.frame object with rows the models and columns the drug combinations. Possible values for each model-drug combination element are either 0 (no synergy predicted), 1 (synergy was predicted) or NA (couldn't find stable states in either the drug combination inhibited model or in any of the two single-drug inhibited models).

models.stable.state

a matrix (nxm) with n models and m nodes. The row names of the matrix specify the models' names whereas the column names specify the name of the network nodes (gene, proteins, etc.). Possible values for each model-node element are either 0 (inactive node) or 1 (active node). Note that the rows (models) have to be in the same order as in the model.predictions parameter.

models.link.operator

a matrix (nxm) with n models and m nodes. The row names of the matrix specify the models' names whereas the column names specify the name of the network nodes (gene, proteins, etc.). Possible values for each model-node element are either 0 (AND NOT link operator), 1 (OR NOT link operator) or 0.5 if the node is not targeted by both activating and inhibiting regulators (no link operator). Default value: NULL (no analysis on the models parameterization regarding the mutation of the boolean equation link operator will be done).

observed.synergies

a character vector with elements the names of the drug combinations that were found as synergistic. This should be a subset of the tested drug combinations, that is the column names of the model.predictions parameter.

threshold

numeric. A number in the [0,1] interval, above which (or below its negative value) a biomarker will be registered in the returned result. Values closer to 1 translate to a more strict threshold and thus less biomarkers are found.

num.of.mcc.classes

numeric. A positive integer larger than 2 that signifies the number of mcc classes (groups) that we should split the models MCC values (excluding the 'NaN' values).

include.NaN.mcc.class

logical. Should the models that have NaN MCC value (e.g. TP+FP = 0, models that predicted no synergies at all) be classified together in one class - the 'NaN MCC Class' - and compared with the other model classes in the analysis? If TRUE (default), then the number of total MCC classes will be num.of.mcc.classes + 1.

Value

a list with various elements:

  • observed.model.predictions: the part of the model.predictions data that includes the observed.synergies.

  • unobserved.model.predictions: the complementary part of the model.predictions data that does not include the observed.synergies

  • predicted.synergies: a character vector of the synergies (drug combination names) that were predicted by at least one of the models in the dataset.

  • synergy.subset.stats: an integer vector with elements the number of models the predicted each observed synergy subset.

  • models.mcc: a numeric vector of MCC values (NaN's can be included), one for each model.

  • diff.state.mcc.mat: a matrix whose rows are vectors of average node activity state differences between two groups of models where the classification was based on the MCC score of each model and was found using an optimal univariate k-means clustering method (Ckmeans.1d.dp). Rows represent the different classification group matchings, e.g. (1,2) means the models that were classified into the first MCC class vs the models that were classified in the 2nd class (higher is better). The columns represent the network's node names. Values are in the [-1,1] interval.

  • biomarkers.mcc.active: a character vector whose elements are the names of the active state biomarkers. These nodes appear more active in the better performance models.

  • biomarkers.mcc.inhibited: a character vector whose elements are the names of the inhibited state biomarkers. These nodes appear more inhibited in the better performance models.

  • diff.link.mcc.mat: a matrix whose rows are vectors of average node link operator differences between two groups of models where the classification was based on the MCC score of each model and was found using an optimal univariate k-means clustering method (Ckmeans.1d.dp). Rows represent the different classification group matchings, e.g. (1,2) means the models that were classified into the first MCC class vs the models that were classified in the 2nd class (higher is better). The columns represent the network's node names. Values are in the [-1,1] interval.

  • biomarkers.mcc.or: a character vector whose elements are the names of the OR link operator biomarkers. These nodes have mostly the OR link operator in their respective boolean equations in the better performance models.

  • biomarkers.mcc.and: a character vector whose elements are the names of the AND link operator biomarkers. These nodes have mostly the AND link operator in their respective boolean equations in the better performance models.

See Also

Other general analysis functions: biomarker_synergy_analysis, biomarker_tp_analysis