extract.lib: Functions for SAGE library extraction

Description

Functions to extract the tags in a library from sequences or base-caller output.

Usage

extract.lib.from.zip(zipfile, libname=sub(".zip","",basename(zipfile)), ...)
extract.lib.from.directory(dirname, libname=basename(dirname), pattern, ...)
extract.library.tags(filelist, base.caller.format="phd", remove.duplicate.ditags=TRUE,  remove.N=FALSE, remove.low.quality=10, taglength=10, min.ditag.length=(2*taglength-2), max.ditag.length=(2*taglength+4), cut.site="catg", default.quality=NA, verbose=TRUE, ...) 
reestimate.lib.from.tagcounts(tagcounts, libname, default.quality=20, ...) 
compute.unique.tags(lib)
combine.libs(..., artifacts=c("Linker", "Ribosomal", "Mitochondrial"))
remove.sage.artifacts(lib, artifacts=c("Linker","Ribosomal","Mitochondrial"), ...)
read.phd.file(file)
read.seq.qual.filepair(file, default.quality=NA)
extract.ditags(sequence, taglength=10, filename=NA, min.ditag.length=(2*taglength-2), max.ditag.length=(2*taglength+4), cut.site="catg")

Arguments

zipfile,dirname

Name of a ZIP file or a directory that contains base-caller output files

libname

libname a character string to be assigned as library name

pattern

Regular expression to specify pattern for the files that will be read

filelist

List of files to be read

base.caller.format

base.caller.format can be "phd" or "seq" or a character vector of the length of the filelist

remove.duplicate.ditags

Remove duplicate ditags. TRUE or FALSE

remove.N

Remove all tags that contain N. TRUE or FALSE

remove.low.quality

Remove all tags with an average quality score of less than remove.low.quality. Skipped if < 0

taglength

Length of tags. Usually 10 or 17

min.ditag.length,max.ditag.length

Minimum and maximum length for ditags

cut.site

Restriction enzyme cut site. Usually CATG

verbose

Display information during process

lib

Library object

file,filename

Character string indicating file name

default.quality

Quality value to use on sequences, if quality files are missing

sequence

Construct containing sequence and quality values returned by read.phd.file or read.seq.qual.filepair

artifacts

Types of artificially generated tags to remove.

...

Arguments passed on to extraction functions.

tagcounts

Tagcounts from library. Integer Vecotor with Tag sequences as names.

Value

lib returns an SAGE library object.

Details

The functions extract.lib.from.zip or extract.lib.from.directory should be used to extract the SAGE TAGS from the sequences of a library, the sequences need to be provided by the output files from the base caller software either in a ZIP archive or in a directory. These are usually the only functions that should directly be called by the user. The other functions are called by these and should only be used directly by experienced users to get more direct control over the process. Most arguments are passed on and can be specified in the high level functions. Zipfilenames must be specified using relative pathnames!

References

http://tagcalling.mbgproject.org

Examples

Run this code

#library(sagenhaft)
#file.copy(system.file("extdata", "E15postHFI.zip",package="sagenhaft"),
#          "E15postHFI.zip")
#E15post<-extract.lib.from.zip("E15postHFI.zip", taglength=10,
#                              min.ditag.length=20, max.ditag.length=24)
#E15post

Run the code above in your browser using DataLab