isolde_test: Statistical analysis of Allele specific read (ASR) counts

Description

The main function of the ISoLDE package. Performs statistical test to identify genes with allelic bias and produces both graphical and textual outputs.

Usage

isolde_test(bias, method = "default", asr_counts, target,
             nboot = 5000, pcore = 75, graph = TRUE, ext = "pdf",
             text = TRUE, split_files = FALSE, prefix =
             "ISoLDE_result", outdir = "")

Arguments

bias

The kind of bias you want to study. It must be one of parental or strain.

method

specifies the statistical method to use for testing. It must be one of default or threshold. Default behaviour is to adapt to the number of replicates: when at least three biological replicates for each reciprocal cross are available the bootstrap resampling method is used, else the threshold method is applied. It is possible to force isolde_test to use the threshold method even when more than three replicates are available. In this case method must be set to threshold. It is *not possible* to force a bootstrap method with less than three replicates.

asr_counts

the data.frame containing the ASR counts to be tested. These data should be normalized and filtered (see the filterT function), although the function can run with non-normalized and non-filtered data (not recommended).

target

the target data.frame (obtained by the readTarget function).

nboot

specifies how many resampling steps to do for the bootstrap method. This option is not considered if threshold value is set for method. Low values of nboot leads to less relevent results (default to 5000).

pcore

a value between 0 and 100 (default to 75) which specifies the proportion of cores (in percent) to be used for the bootstrap method.

graph

if TRUE (default) graphical outputs are produced (both on device and file).

ext

specifies the extension of the graphical file output (does not work if graph = FALSE). It must be one of pdf (default), png or eps.

text

if TRUE (default) textual output files are produced.

split_files

if text = TRUE, specifies if your want to have all genes in one same output file (FALSE, default) or four separate files according to the result: ASE, biallelic, undetermined or filtered (TRUE).

prefix

specifies the prefix for all output file names (default to "ISoLDE_result").

outdir

specifies the path where to write the output file(s) (default to current directory).

Value

listASEa data.frame with one row per gene (or transcript) identified as having an allelic bias and five columns: - names contains gene (or transcript) names such as asr_counts row names, - criterion contains the criterion value (see vignette or Reynès et al. (2016)), - diff_prop the criterion numerator which contains the difference between proportions of either parents or strain origins, - variability the criterion denominator which quantifies the gene (or transcript) variability between replicates, - origin specifies the bias direction either "P" or "M" for parental bias or one of specified strain names for strain bias.
listBAa data.frame with one row per gene (or transcript) identified as biallelically expressed and four columns corresponding to the first four ones in listASE.
listUNa data.frame with one row per gene (or transcript) with undetermined status and six columns. The first five columns are the same as listASE, the last one may take three values: - FLAG_consistency for genes no statistical evidence of neither bias nor biallelic expression but whose parental or strain bias is always in the same direction across replicates, - FLAG_significance for genes with statistical evidence of bias but with discrepancies in bias direction across replicates, - NO_FLAG for other undetermined genes.
listFILTa data.frame containing names of genes that have failed the minimal filtering step and thus that have not been considered during the statistical test.
ASE, BA and UN lists are sorted according to their criterion value.

Details

Before using this function, your data should be normalized and filtered (see the filterT function for filtering) although the function can run with non-normalized and/or non-filtered data. The method depends on your minimum number of replicates for each reciprocal cross. If only one replicate is found, the test can not be achieved and exits. method=default : If more than two replicates per cross, the method takes advantage of having enough information by using bootstrap resampling to identify genes with allelic bias. If only two replicates are found in at least one cross, there is too few information to obtain reliable distributions from resampling. Genes with allelic bias are identified thanks to empirically defined thresholds. method=threshold : The empirical method will be processed instead of the bootstrap one, even if more than two replicates per cross are found. Note that in differential RNA-seq analysis, at least three replicates are strongly recommended, as variability estimation quality is a key factor in statistical analysis. More details in Reynès, C. et al. (2016) ISoLDE: a new method for identification of allelic imbalance. Submitted

References

Reynès, C. et al. (2016): ISoLDE: a new method for identification of allelic imbalance. Submitted

Examples

Run this code

# Loading all required data.frames
  data(filteredASRcounts)
  data(target)
  # Statistical analysis (forcing the threshold option)
  isolde_res <- isolde_test(bias = "parental", method = "threshold", 
asr_counts = filteredASRcounts, target = target, ext = "pdf",
prefix = "ISoLDE_test")

Run the code above in your browser using DataLab