seqinr (version 1.0-1)

oriloc: Prediction of origin and terminus of replication in bacteria

Description

This program finds the putative origin and terminus of replication in procaryotic genomes. The program works with unannotated sequences and therefore uses glimmer2 outputs to discriminate between codon positions.

Usage

oriloc(seq.fasta = system.file("sequences/ct.fasta", package ="seqinr"),
 g2.coord = system.file("sequences/ct.coord", package = "seqinr"),
oldoriloc = FALSE, gbk = NULL, clean.tmp.files = TRUE, rot = 0)

Arguments

seq.fasta
the name of a file which contains the dna sequence of a bacterial chromosome in fasta format
g2.coord
the name of file which contains the output of glimmer2 program
oldoriloc
logical to be set at TRUE to reproduce the (deprecated) outputs of previous (publication date: 2000) version of the oriloc program
gbk
the URL of a file in GenBank format
clean.tmp.files
Logical, if TRUE temporary files are removed
rot
Integer, with zero default value, used to permute circurlarly the genome.

Value

  • A data.frame with seven columns: g2num for the CDS number in the g2.coord file, start.kb for the start position of CDS expressed in Kb (this is the position of the first occurence of a nucleotide in a CDS regardless of its orientation), end.kb for the last position of a CDS, CDS.excess for the DNA walk for gene orientation (+1 for a CDS in the direct strand, -1 for a CDS in the reverse strand) cummulated over genes, skew for the cummulated composite skew in third codon positions, x for the cummulated T - A skew in third codon position, y for the cummulated C - G skew in third codon positions.

Details

The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. The program works with unannotated sequences in fasta format and therefore uses glimmer2.0 outputs to discriminate between codon positions so as to increase the signal/noise ratio.

References

The original paper for oriloc: Frank, A.C., Lobry, J.R. (2000) Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics, 16:566-567. http://bioinformatics.oupjournals.org/cgi/reprint/16/6/560 A simple informal introduction to DNA-walks: Lobry, J.R. (1999) Genomic landscapes. Microbiology Today, 26:164-165. http://www.socgenmicrobiol.org.uk/QUA/049906.pdf An early and somewhat historical application of DNA-walks: Lobry, J.R. (1996) A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie, 78:323-326.

To have an overview of the seqinR's functionnality, please consult this vignette: Charif, D., Lobry, J.R. (2005) SeqinR: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. Springer Verlag, Biological and Medical Physics/Biomedical Series, in preparation.

Examples

Run this code
out <- oriloc()
  plot(out$st, out$sk, type="l", xlab="Map position in Kb",
    ylab = "Cumulated composite skew", 
    main=expression(italic(Chlamydia~~trachomatis)~~complete~~genome))

Run the code above in your browser using DataCamp Workspace