Learn R Programming

CONDOP (version 1.0)

pre.proc: Prepare data inputs for the main function run.CONDOP().

Description

Load the annotation files and a list of count tables (or coverage vectors). Each count table is related to a specific experimental condition and it must contain two columns: fwd (coverage depth on the forward strand) and rev (coverage depth on the reverse strand). The annotations files are: - GFF-like file, it can be downloaded from the NCBI genomes ftp directory, ftp://ftp.ncbi.nih.gov/genomes. - DOOR-like file, it can be downloaded from http://csbl.bmb.uga.edu/DOOR/displayspecies.php. - FASTA-like file, it can be downloaded from www.ncbi.nlm.nih.gov.

Usage

pre.proc(gff.file, door.op.file, fasta.file, list.cov.dat, remove.cov = list("rRNA"), log2.expr = TRUE, sw = 100, save.data.file = NULL, verbose = TRUE)

Arguments

gff.file
A full local path indicating the GFF-like file to load .
door.op.file
A full local path indicating the DOOR-like file to load (DOOR-operon annotations).
fasta.file
A full local path indicating the FASTA-like file to load or a character string representing the accession number of the genome sequence to download.
list.cov.dat
List of count tables.
remove.cov
List of character values. Each charcater value corresponds to a specific type of annotated features. The coverage depth from those annotated feature will be removed. The default list contains "rRNA". The coverage depth of "rRNA" features will be removed.
log2.expr
Logical value indicating whether CONDOP will be using logged values of expression. The expression values are compiled in RPKM values. Default logical value is TRUE.
sw
Numeric value specifying the sliding window size. Default value is 100.
save.data.file
Character string naming a file. The file will contain the input for the CONDOP main process.
verbose
Indicate whether information about the process should be reported. Defaults to TRUE.

Value

A list of data inputs for the main process run.CONDOP.
genes.and.ops
A merged dataframe containing information about genes/features and operons merged.
gseq
A character vector representing the genome sequence of the target organism.
igr.pos
A dataframe containing information about intergenic regions (IRGs) - forward (+) strand.
igr.neg
A dataframe containing information about intergenic regions (IRGs) - reverse (-) strand.
tl.cds
A list of dataframes containing the expression levels of annotated coding sequences (CDS regions). One dataframe for each count table.
tl.igr.pos
A list of dataframes containing the expression levels of intergenic sequences (IGR regions) - forward (+) strand. One dataframe for each count table.
tl.igr.neg
A list of dataframes containing the expression levels of intergenic sequences (IGR regions) - reverse (-) strand. One dataframe for each count table.
sid.points
A list of dataframes containing information about boundaries of transcriptionally active regions.
cut.lhe
A list of numeric vectors indicating the cut-off values to distinguish low expressed RNA-seq data from high expression data on the forward and reverse strands. One dataframe for each count table.

Examples

Run this code
## Not run: 
#     file_operon_annot <- system.file("extdata", "1944.opr", package="CONDOP")
#     file_genome_seq   <- system.file("extdata", "EC-k12-MG1655.fasta", package="CONDOP")
#     data(ct1)
#     data.in <- pre.proc(file_genome_annot, file_operon_annot, "NC_000913", 
#                         list.cov.dat = list(ct1 = ct1)) 
# ## End(Not run)

Run the code above in your browser using DataLab