exportload and save objects from and to particular file formats. The rtracklayer package implements support for a number of annotation and sequence formats.
export(object, con, format, ...) import(con, format, text, ...)
RTLFilederivative, the data is loaded from or saved to the underlying resource. If missing, the function will return the output as a character vector, rather than writing to a connection.
conis a filename, the format is derived from the file extension. This argument is unnecessary when
conis a derivative of
conis missing, this can be a character vector directly providing the string data to import.
conis missing, a character vector containing the string output. Otherwise, nothing is returned.
RTLFile. Below, we list the major supported formats, with some advice for when a particular file format is appropriate:
export.ucsc(subformat = "gff1"). The BED format is typically preferred over GFF for interaction with UCSC. GFF files can be indexed with the tabix utility for fast range-based queries via rtracklayer and Rsamtools.
bedGraph. For large data, consider
WIG(which are now somewhat obsolete). A BigWig file contains a spatial index for fast range-based queries and also embeds summary statistics of the scores at several zoom levels. Thus, it is ideal for visualization of and parallel computing on genome-scale vectors, like the coverage from a high-throughput sequencing experiment.
In summary, for the typical use case of combining gene models with
experimental data, GFF is preferred for gene models and
BigWig is preferred for quantitative score vectors. Note that
the Rsamtools package provides support for the
BAM file format (for representing
read alignments), among others. Based on this, the rtracklayer package
export method for writing
GappedReads objects as
BAM. For variants, consider
VCF, supported by the VariantAnnotation package.
There is also support for reading and writing biological sequences,
including the UCSC
TwoBit format for
compactly storing a genome sequence along with a mask. The files are
binary, so they are efficiently queried for particular ranges. A
similar format is
FA, supported by
track <- import(system.file("tests", "v1.gff", package = "rtracklayer")) ## Not run: export(track, "my.gff", version = "3") ## equivalently, ## Not run: export(track, "my.gff3") ## or ## Not run: # con <- file("my.gff3") # export(track, con, "gff3") # close(con) # ## End(Not run) ## or as a string export(track, format = "gff3")
Run the code above in your browser using DataCamp Workspace