library
call.
The matrices are obtained from six public sources:
FlyFactorSurvey: |
614 |
hPDI: |
437 |
JASPAR_CORE: |
459 |
jolma2013: |
843 |
ScerTF: |
196 |
stamlab: |
683 |
UniPROBE: |
380 |
cisbp 1.02 |
874 |
Representing primarily five organsisms (and 49 total):
Hsapiens: |
2328 |
Dmelanogaster: |
1008 |
Scerevisiae: |
701 |
Mmusculus: |
660 |
Athaliana: |
160 |
Celegans: |
44 |
other: |
177 |
All the matrices are stored as position frequency matrices, in which each columm (each position) sums to 1.0. When the number of sequences which contributed to the motif are known, that number will be found in the matrix's metadata. With this information, one can transform the matrices into either PCM (position count matrices), or PWM (position weight matrices), also known as PSSM (position-specific-scoring matrices). The latter transformation requires that a model of the background distribution be known, or assumed.
The names of the matrices are the same as rownames of the metadata DataFrame, and have been chosen to balance the needs of concision and full description, including the organism in which the motif was discovered, the data source, and the name of the motif in the data source from which it was obtained. For example: "Hsapiens-JASPAR_CORE-SP1-MA0079.2" and "Scerevisiae-ScerTF-GSM1-badis".
Subsets of the Matrices may be obtainted in several ways:
MotifDb [[1]]
as.list (query (MotifDb, 'FBgn0000014'))
as.list (subset (MotifDb,
geneSymbol=='Abda' & !is.na (pubmedID)))
The matrices are stored in a SimpleList
which has semantics very
similar to the familiar list of R base. To examine a matrix, however,
you must sidestep the MotifDb show
method. These three commands
display quite different results:
> MotifDb [1] MotifDb object of length 1 | Created from downloaded public sources: 2012-Jul6 | 1 position frequency matrices from 1 source: | FlyFactorSurvey: 1 | 1 organism/s | Dmelanogaster: 1 Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750
> MotifDb [[1]] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A 0.0 0.50 0.20 0.35 0 0 1 0 0 0.55 0.35 0.05 0.20 0.45 0.20 0.10 0.40 0.40 0.25 0.50 0.30 C 0.3 0.15 0.25 0.00 1 1 0 0 0 0.10 0.65 0.70 0.45 0.25 0.10 0.25 0.25 0.10 0.10 0.25 0.25 G 0.4 0.05 0.50 0.65 0 0 0 1 1 0.00 0.00 0.05 0.05 0.15 0.05 0.20 0.05 0.15 0.55 0.15 0.45 T 0.3 0.30 0.05 0.00 0 0 0 0 0 0.35 0.00 0.20 0.30 0.15 0.65 0.45 0.30 0.35 0.10 0.10 0.00
> as.list (MotifDb [1]) $`Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A 0.0 0.50 0.20 0.35 0 0 1 0 0 0.55 0.35 0.05 0.20 0.45 0.20 0.10 0.40 0.40 0.25 0.50 0.30 C 0.3 0.15 0.25 0.00 1 1 0 0 0 0.10 0.65 0.70 0.45 0.25 0.10 0.25 0.25 0.10 0.10 0.25 0.25 G 0.4 0.05 0.50 0.65 0 0 0 1 1 0.00 0.00 0.05 0.05 0.15 0.05 0.20 0.05 0.15 0.55 0.15 0.45 T 0.3 0.30 0.05 0.00 0 0 0 0 0 0.35 0.00 0.20 0.30 0.15 0.65 0.45 0.30 0.35 0.10 0.10 0.00
There are fifteen kinds of metadata -- though not all matrices have a full complement: not all of the public sources are complete in this regard. The information falls into these categories, using the Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750 entry as an example (see below for the associated position frequency matrix):
# are there any matrices for Sox4? we find two
mdb.sox4 <- MotifDb [grep ('sox4', values (MotifDb)$geneSymbol, ignore.case=TRUE)]
# the same two matrices can be obtained this way also
if (interactive ())
mdb.sox4 <- subset (MotifDb, tolower(geneSymbol)=='sox4')
# and like this
mdb.sox4 <- query (MotifDb, 'sox4') # matches against all fields in the metadata
# implicitly invoke the 'show' method
mdb.sox4
# get their full names
names (mdb.sox4)
# examine their metadata
values (mdb.sox4)
# examine the matrices with names include
as.list (mdb.sox4)
# export the matrices in meme format
destination.file = tempfile ()
export (mdb.sox4, destination.file, 'meme')
Run the code above in your browser using DataLab