sgd: Yeast gene model sample data

Description

This data set contains a data frame describing a subset of the chromosome feature data represented in Fall 2007 version of saccharomyces\_cerevisiae.gff, available for download from the Saccharomyces Genome Database (http://www.yeastgenome.org).

Usage

data(sgd)

Arguments

Format

A data frame with 14080 observations on the following 8 variables.

SGDID: SGD feature ID.
type: Only four feature types have been retatined: "CDS", "five_prime_UTR_intron", "intron", and "ORF". Note that "ORF" correspond to a whole gene while "CDS", to an exon. S. cerevisae does not, however, have many multi-exonic genes.
feature_name: A character vector
parent_feature_name: The feature_name of the a larger element to which the current feature belongs. All retained "CDS" entries, for example, belong to an "ORF" entry.
chr: The chromosome on which the feature occurs.
start: Feature start base.
stop: Feature stop base.
strand: Is the feature on the Watson or Crick strand?

Examples

Run this code

# NOT RUN {
# An example to compute "promoters", defined to be the 500 bases
# upstream from an ORF annotation, provided these bases don't intersect
# another orf. See documentation for the sgd data set for more details
# on the annotation set.

use_chr <- "chr01"

data( sgd )
sgd <- subset( sgd, chr == use_chr )

orf <- Intervals(
                 subset( sgd, type == "ORF", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name

W <- subset( sgd, type == "ORF", "strand" ) == "W"

promoters_W <- Intervals(
                         cbind( orf[W,1] - 500, orf[W,1] - 1 ),
                         type = "Z"
                         )

promoters_W <- interval_intersection(
                                     promoters_W,
                                     interval_complement( orf )
                                     )

# Many Watson-strand genes have another ORF upstream at a distance of
# less than 500 bp

hist( size( promoters_W ) )

# All CDS entries are completely within their corresponding ORF entry.

cds_W <- Intervals(
                 subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( cds_W ) <- NULL

interval_intersection( cds_W, interval_complement( orf[W,] ) )

# }

Run the code above in your browser using DataLab