Learn R Programming

RNA-Seq Generation/Modification for Simulation

This package will take real RNA-seq data (either single-cell or bulk) and alter it by adding signal to it. This signal is in the form of a generalized linear model with a log (base-2) link function under a Poisson / negative binomial / mixture of negative binomials distribution. The advantage of this way of simulating data is that you can see how your method behaves when the simulated data exhibit common (and annoying) features of real data. This is without you having to specify these features a priori. We call the way we add signal “binomial thinning”.

The main functions are:

  • select_counts(): Subsample the columns and rows of a real RNA-seq count matrix. You would then feed this sub-matrix into one of the thinning functions below.
  • thin_diff(): The function most users should be using for general-purpose binomial thinning. For the special applications of the two-group model or library/gene thinning, see the functions listed below.
  • thin_2group(): The specific application of thinning in the two-group model.
  • thin_lib(): The specific application of library size thinning.
  • thin_gene(): The specific application of total gene expression thinning.
  • thin_all(): The specific application of thinning all counts.
  • effective_cor(): Returns an estimate of the actual correlation between the surrogate variables and a user-specified design matrix.
  • ThinDataToSummarizedExperiment(): Converts a ThinData object to a SummarizedExperiment() object.
  • ThinDataToDESeqDataSet(): Converts a ThinData object to a DESeqDataSet object.

If you find a bug or want a new feature, please submit an issue.

Check out NEWS for updates.

Installation

To install from CRAN, run the following code in R:

install.packages("seqgendiff")

To install the latest version of seqgendiff, run the following code in R:

install.packages("devtools")
devtools::install_github("dcgerard/seqgendiff")

To get started, check out the vignettes by running the following in R:

library(seqgendiff)
browseVignettes(package = "seqgendiff")

Or you can check out the vignettes I post online: https://dcgerard.github.io/seqgendiff/.

Citation

If you use this package, please cite:

Gerard, D (2020). “Data-based RNA-seq simulations by binomial thinning.” BMC Bioinformatics. 21(1), 206. doi: 10.1186/s12859-020-3450-9.

A BibTeX entry for LaTeX users is

@article{gerard2020data,
    author = {Gerard, David},
    title = {Data-based {RNA}-seq simulations by binomial thinning},
    year = {2020},
    volume={21},
    number={1},
    pages={206},
    doi = {10.1186/s12859-020-3450-9},
    publisher = {BioMed Central Ltd},
    journal = {BMC Bioinformatics}
}

Code of Conduct

Please note that the ‘seqgendiff’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('seqgendiff')

Monthly Downloads

213

Version

1.2.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

David Gerard

Last Published

May 15th, 2024

Functions in seqgendiff (1.2.4)

thin_gene

Binomial thinning for altering total gene expression levels
summary.ThinData

Provide summary output of a ThinData S3 object.
thin_lib

Binomial thinning for altering library size.
thin_base

Base binomial thinning function.
thin_all

Binomial thinning for altering read-depth.
thin_diff

Binomial thinning for differential expression analysis.
uncorassign

Group assignment independent of anything.
thin_2group

Binomial thinning in the two-group model.
seqgendiff-package

seqgendiff: RNA-Seq Generation/Modification for Simulation
corassign

Group assignment that is correlated with latent factors.
poisthin

Apply Poisson thinning to a matrix of count data.
est_sv

Estimate the surrogate variables.
select_counts

Subsample the rows and columns of a count matrix.
ThinDataToDESeqDataSet

Converts a ThinData S3 object into a DESeqDataSet S4 object.
effective_cor

Estimates the effective correlation.
ThinDataToSummarizedExperiment

Converts a ThinData S3 object into a SummarizedExperiment S4 object.
permute_design

Permute the design matrix so that it is approximately correlated with the surrogate variables.
fix_cor

Fixes an invalid target correlation.