filterT: FILTERING GENES BEFORE STATISTICAL ANALYSIS

Description

Filter lowly expressed genes (or transcripts) according to a data driven threshold, before any statistical analysis. This step is not mandatory but strongly recommended.

Usage

filterT(rawASRcounts, normASRcounts, target, tol_filter = 0,
bias)

Arguments

rawASRcounts

the data.frame containing raw counts (obtained with the readRawInput function or any data.frame following rawASRcounts format specifications). Raw count data.frame is required when filtering on raw or on normalized data when the normalized data do not contain 0 counts. (For simplicity purpose, we call '0 count' any value of zero in a count file).

normASRcounts

the data.frame containing normalized counts (obtained with the readNormInput function or any data.frame following normASRcounts format specifications). We strongly recommend to filter on normalized ASR counts.

target

the data.frame containing the target meta data (obtained with the readTarget function or any data.frame following target format specifications).

tol_filter

a value between 0 and 100 allowing to introduce tolerance rate into filtering step: if tol_filter = 25 all genes having less than 25% of their counts from at least one parental (or strain) origin below the threshold are selected (the default value 0 means all raw counts from at least one parental (or strain) origin must be above threshold, 100 means that no filtering is applied).

bias

The kind of allele expression bias you want to study. It must be one of parental or strain.

Value

A list of two data.frame:
filteredASRcountsThis data.frame contains ASR counts that have successfully passed the filtering step.
removedASRcountsThis data.frame contains ASR counts that have been removed by the filtering step.
Each line represents a feature (e.g. a gene, transcript). Each column represents the number of allele-specific sens reads from either the paternal or maternal parent for a given biological replicate, so that you expect to have two columns per biological replicate.

Details

Filtering in statistical analysis is recommended to avoid considering genes (or transcript) without enough information, and thus to avoid a too strong effect of multiple test correction.

The aim of our filtering method is to eliminate from analysis not enough quantified genes, that is genes having mostly counts of 0 or near 0 for each replicate in at least one condition (parent, strain). In this purpose, the filterT function searches for the distribution of counts of a gene in a condition when most of read counts are 0 for this condition. This distribution allows to define a threshold. Hence, genes having less counts than this threshold are eliminated.

The filtering step is not mandatory but strongly recommended.

References

Reynès, C. et al. (2016): ISoLDE: a new method for identification of allelic imbalance. Submitted

Examples

Run this code

# Loading all required data.frames
data(rawASRcounts)
data(normASRcounts)
data(target)

# Filtering genes from the ASR count data.frame in parental bias study
res_filterT <- filterT(rawASRcounts = rawASRcounts,
                       normASRcounts = normASRcounts,
                       target = target, bias="parental")
filteredASRcounts <- res_filterT$filteredASRcounts
removedASRcounts <- res_filterT$removedASRcounts

Run the code above in your browser using DataLab