makeTxDb is a low-level constructor for making
a TxDb object from user supplied transcript annotations.
See ?makeTxDbFromUCSC and
?makeTxDbFromBiomart for higher-level
functions that feed data from the UCSC or BioMart sources
to makeTxDb.makeTxDb(transcripts, splicings,
genes=NULL, chrominfo=NULL, metadata=NULL,
reassign.ids=FALSE)"name" and "value"
and their type must be character.reassign.ids is FALSE and if the ids are supplied, then
they are used as the internal ids, otherwise the internal ids are assigned
in a way that is compatible with the order defined by ordering the
features first by chromosome, then by strand, then by start, and finally
by end.transcripts (required), splicings (required)
and genes (optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, cds and genes at the moment).
The chrominfo (optional) argument must be a data frame
containing chromosome information like the length of each chromosome. transcripts must have 1 row per transcript and the following
columns:
tx_id: Transcript ID. Integer vector. No NAs. No duplicates.tx_name: [optional] Transcript name. Character vector (or
factor). NAs and/or duplicates are ok.tx_type: [optional] Transcript type (e.g. mRNA, ncRNA, snoRNA,
etc...). Character vector (or factor). NAs and/or duplicates are ok.tx_chrom: Transcript chromosome. Character vector (or factor)
with no NAs.tx_strand: Transcript strand. Character vector (or factor)
with no NAs where each element is either"+"or"-".tx_start,tx_end: Transcript start and end.
Integer vectors with no NAs. splicings must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus, optionally,
the cds contained in this exon. Its columns must be:
tx_id: Foreign key that links each row in thesplicingsdata frame to a unique row in thetranscriptsdata frame.
Note that more than 1 row insplicingscan be linked to the
same row intranscripts(many-to-one relationship).
Same type astranscripts$tx_id(integer vector). No NAs.
All the values in this column must be present intranscripts$tx_id.exon_rank: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id,exon_rank)
pairs must be unique.exon_id: [optional] Exon ID.
Integer vector with no NAs.exon_name: [optional] Exon name. Character vector (or factor).
NAs and/or duplicates are ok.exon_chrom: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing thentranscripts$tx_chromis used.
If present thenexon_strandmust also be present.exon_strand: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing thentranscripts$tx_strandis used
andexon_chrommust also be missing.exon_start,exon_end: Exon start and end.
Integer vectors with no NAs.cds_id: [optional] cds ID. Integer vector.
If present thencds_startandcds_endmust also
be present.
NAs are allowed and must match NAs incds_startandcds_end.cds_name: [optional] cds name. Character vector (or factor).
If present thencds_startandcds_endmust also be
present. NAs and/or duplicates are ok. Must be NA if correspondingcds_startandcds_endare NAs.cds_start,cds_end: [optional] cds start and end.
Integer vectors.
If one of the 2 columns is missing then allcds_*columns
must be missing.
NAs are allowed and must occur at the same positions incds_startandcds_end. genes must have N rows per transcript, where N is the nb
of genes linked to the transcript (N will be 1 most of the time).
Its columns must be:
tx_id: [optional]genesmust have either atx_idor atx_namecolumn but not both.
Likesplicings$tx_id, this is a foreign key that
links each row in thegenesdata frame to a unique
row in thetranscriptsdata frame.tx_name: [optional]
Can be used as an alternative to thegenes$tx_idforeign key.gene_id: Gene ID. Character vector (or factor). No NAs. chrominfo must have 1 row per chromosome and the following
columns:
chrom: Chromosome name.
Character vector (or factor) with no NAs and no duplicates.length: Chromosome length.
Integer vector with either all NAs or no NAs.is_circular: [optional] Chromosome circularity flag.
Logical vector. NAs are ok.makeTxDbFromUCSC,makeTxDbFromBiomart,makeTxDbFromGRanges, andmakeTxDbFromGFF,
for convenient ways to make aTxDbobject from UCSC or BioMart
online resources, or from aGRangesobject,
or from a GFF or GTF file.saveDbandloadDbin thetranscripts <- data.frame(
tx_id=1:3,
tx_chrom="chr1",
tx_strand=c("-", "+", "+"),
tx_start=c(1, 2001, 2001),
tx_end=c(999, 2199, 2199))
splicings <- data.frame(
tx_id=c(1L, 2L, 2L, 2L, 3L, 3L),
exon_rank=c(1, 1, 2, 3, 1, 2),
exon_start=c(1, 2001, 2101, 2131, 2001, 2131),
exon_end=c(999, 2085, 2144, 2199, 2085, 2199),
cds_start=c(1, 2022, 2101, 2131, NA, NA),
cds_end=c(999, 2085, 2144, 2193, NA, NA))
txdb <- makeTxDb(transcripts, splicings)Run the code above in your browser using DataLab