Usage
RGBM(E = matrix(rnorm(100), 10, 10), K = matrix(0, nrow(E), ncol(E)),
g_M = matrix(1, 10, 10), tfs = paste0("G", c(1:10)),
targets = paste0("G", c(1:10)), lf = 1, M = 5000, nu = 0.001, s_f = 0.3,
no_iterations = 2, mink = 0, experimentid = 1, outputpath= "DEFAULT",
sample_type = "Exp1_", real = 0)
Arguments
E
N-by-p expression matrix. Columns correspond to genes, rows correspond to experiments. E is expected to be already normalized using standard methods, for example RMA. Colnames of E is the set of all p genes and Ntfs represents the number of transcription factors and Ntargets represents the number of target genes.
K
N-by-p initial perturbation matrix. It directly corresponds to E matrix, e.g. if K[i,j] is equal to 1, it means that gene j was knocked-out in experiment i. Single gene knock-out experiments are rows of K with only one value 1. Colnames of K is set to be the set of all genes. By default it's a matrix of zeros of the same size as E, e.g. unknown initial perturbation state of genes.
g_M
Initial mechanistic network in the form of an adajcency matrix (Ntf-by-Ntargets). Here each column is a binary vector where only those elements are 1 when the corresponding transcription factor has a connection with that target gene. Colnames of g_M should be same as names of targets and Rownames of g_M should be same as names of Tfs. By default it's a matrix of ones of size Ntfs x Ntargets.
tfs
List of names of transcription factors
targets
List of names of target genes
lf
Loss Function: 1 -> Least Squares and 2 -> Least Absolute Deviation
M
Number of extensions in boosting model, e.g. number of iterations of the main loop of RGBM algorithm. By default it's 5000.
nu
Shrinkage factor, learning rate, 0<nu<=1. Each extension to boosting model will be multiplied by the learning rate. By default it's 0.001.
s_f
Sampling rate of transcription factors, 0<s_f<=1. Fraction of transcription factors from E, as indicated by tfs
vector, which will be sampled without replacement to calculate each extesion in boosting model. By default it's 0.3.
no_iterations
Number of times initial GRN to be constructed and then averaged to generate smooth edge weights for the initial GRN as shown in first_GBM_step
mink
specified threshold i.e. the minimum number of Tfs to be considered while optimizing the L-curve criterion. By default it's 0.
experimentid
The id of the experiment being conducted. It takes natural numbers like 1,2,3 etc. By default it's 1.
outputpath
Location where intermediate Adjacency_Matrix and Images folder will be created. By default
it's a temp directory (e.g. /tmp/Rtmp...)
sample_type
String arguement representing a label for the experiment i.e. in case of DREAM3 challenge sample_type="DREAM3".
real
Numeric value 0 or 1 corresponding to simulated or real experiment respectively.