The R package SLEMI is designed to estimate channel capacity between finite state input and multidimensional output from experimental data. For efficient computations, it uses iterative algorithm based on logistic regression. The core function capacity_logreg_main()
is the basic interface to all functionalities provided in the package. A comprehensive documentation is available in directory old_vignettes/SLEMI_vignette.pdf
.
The main software requirement is the installation of the R environment (version: >= 3.6), which can be downloaded from R project website and is distributed for all common operating systems. We tested the package in R environment installed on Windows 7, 10; Mac OS X 10.11 - 10.13 and Ubuntu 18.04 with no significant differences in the performance. The use of a dedicated Integrated development environment (IDE), e.g. posit is recommended. Apart from base installation of R, SLEMI requires following packages:
Each of the above packages can be installed by running
install.packages("name_of_a_package")
in the R console.
SLEMI is on R CRAN now. Please install using
install.packages("SLEMI")
In order to install directly from GitHub, use following commands in R's console
# install.packages("devtools") # run if not installed
library(devtools)
install_github("TJetka/SLEMI")
The package is based on a main wrapper function - capacity_logreg_main()
for calculation of channel capacity, which calls specific methods implemented within this package. Similarly, functions mi_logreg_main()
can be used to estimate mutual information, while prob_discr_pairwise()
to compute probabilities of discrimination between two different input states.
For the calculation of channel capacity between X and Y you need structure experimental data into a single data.frame
object with observations in rows, one column with values of input (X), preferably of factor
type and columns with measured output (Y) of numeric
type.
In order to estimate channel capacity, using basic logistic regression model, call
capacity_logreg_main(dataRaw, signal, response, output_path)
where:
dataRaw
is a data.frame with experimental data as described abovesignal
is a character indicating the name of column in dataRaw
with the input (X)response
is a character vector indicating names of columns in dataRaw
with output (Y) variablesoutput_path
is a character with the directory, to which results of the estimation should be savedThe function capacity_logreg_main
returns a list, whose main elements are
nnet
object describing fitted logistic regression modelFor convenience of further analysis, this list is saved in output_path
directory in a file output.rds
. In addition to that, a set of exploratory graphs are created to visualise obtained estimates.
Additional examples of using package with some background on information theory is given in paper/TestingProcedures.pdf
and implemented in script paper/testing_procedures.R
. Codes used in publication are accessible from paper/paper_MP.R
and paper/paper_SI.R
respectively.
In the manuscript describing methodological aspects of our algorithm we present the analysis of information transmission in NfkB pathway upon the stimulation of TNF-$\alpha$. Experimental data from this experiment in the form of single-cell time series are attached to the package as a data.frame object and can be accessed using data_nfkb
variable.
Each row of data_nfkb
represents a single observation of a cell. Column 'signal' indicates the level of TNF-$\alpha$ stimulation for a given cell, while columns 'response_T', gives the normalised ratio of nuclear and cytoplasmic transcription factor as described in Supplementary Methods of the corresponding publication.
Apart from required arguments, the function capacity_logreg_main
has also other parameters than can be used to tune the activity of the algorithm. These are
model_out
(default=TRUE
) - logical, specify if nnet
model object should be saved into output filegraphs
(default=TRUE
) - logical, controls creating diagnostic plots in the output directory.plot_width
(default = 6
) - numeric, the basic width of created plotsplot_height
(default = 4
) - numeric, the basic height of created plotsscale
(default = TRUE
) - logical, value indicating if the columns of dataRaw
are to be centered and scaled, what is usually recommended for the purpose of stability of numerical computations. From a purely theoretical perspective, such transformation does not influence the value of channel capacity.lr_maxit
(default = 1000
) - (argument of nnet
package) a maximum number of iterations of optimisation step in logistic regression algorithm. Set to higher value if your data is more complex or of high dimension.MaxNWts
(default = 5000
) - (argument of nnet
package) a maximum number of parameters in logistic regression model. Set to higher value if you data has many dimensions or input has many states.We implemented two diagnostic procedures to control the performance of channel capacity estimation and to measure uncertainty due to finite sample size and model over-fitting. These include:
In order to use those procedures, user must provide additional arguments to function logreg_capacity_main()
, i.e.
doParallel
package) in parallel computing,Please mail t.jetka at gmail.com in case of any bugs, problems and questions regarding package or inquiries regarding information theory.
Please cite
Jetka T, Nienałtowski K, Winarski T, Błoński S, Komorowski M (2019) Information-theoretic analysis of multivariate single-cell signaling responses. PLOS Computational Biology 15(7): e1007132. https://doi.org/10.1371/journal.pcbi.1007132
SLEMI is released under the GNU license and is freely available. A comprehensive documentation is available in directory old_vignettes/SLEMI_vignette.pdf
.
install.packages('SLEMI')