Each selection criterion is written using the following syntax:- c = criterion value
{where c indicates which criterion is used.
Many selection criteria are available. They correspond mainly to the
structured elements of the sequence documentation in the data banks,
and are detailled thereafter. Criteria can be combined using 3 logical
operations:
criterion1 ET criterion2 : logical AND (sequences that fit criteria 1 and 2
simultaneously).
criterion1 OU criterion2 : logical OR (sequences that fit at least one of both criteria).
NO criterion1 : logical negation (sequences that do not fit criterion 1).
Parentheses can be used to delimit the range of operations.
List of sequences can be re-used at will, which is very convenient to
fragment complexe requests into simple requests. For instance, here are
two equivalent ways to get all coding sequences from Escherichia coli
that are not partial:
choosebank("genbank")
query("final", "sp=escherichia coli ET t=cds ET NO k=partial")
choosebank("genbank")
query("eco", "sp=escherichia coli")
query("ecocds", "eco ET t=cds")
query("final", "ecocds ET NO k=partial")
}
- SP = species name
{ sequences from given (group of) species.
The special character @ can be used to match any group of characters in
the species name, ex: SP=RATTUS@.
Use of space is allowed. Examples: ESCHERICHIA COLI, @COLI, E@COLI. Species names are tree-structured according to the biological classification
of species.}
- K = keyword
{ sequences having a given keyword. Since keywords are
tree structured, as are species, you will select all
sequences associated to keywords further down in tree.
(@ can be used to match any group of characters) }
- R = reference code
{sequences from a given reference. References are specified as follows depending on the type of document:}
rlll{
Document Format Example
Journal article journal_code/volume/1st_page jme/34/17
Book book/year/1st_author book/1980/broker
Thesis thesis/year/1st_author thesis/1984/wildgruber
Patent patent/patent_coded_number patent/ep0238993
Unpublished, or submitted unpubl/year/1st_author unpubl/1993/cho
}
- J = journal name
{sequences published in a given journal.}
- Y = year
{sequences published in given year (e.g. 1982).}
- Y > year
{sequences published after or during a given year.}
- Y < year
{sequences published before or during a given year.}
- AU = author
{sequences published by given author(s). Use @ to specify
any letters in name (e.g. @ORMOND@ for Van Ormondt).
Only last names are indexed - initials are ignored. All authors of journal articles are indexed. Only the first author of books, theses, patents and other documents is indexed.
}
- T = sequence type
{ sequences of given type. You generally obtain
subsequences with this criterion because types are for example tRNA,
rRNA or protein gene.
Type should not be confused with molecule which denotes the chemical nature of the sequenced molecule (e.g., DNA, mRNA, tRNA). Type is defined only for the nucleotide sequence banks. Presently the existing types are:}
lll{
ID Locus entry (EMBL, SWISS-PROT, NRSub)
LOCUS Locus entry (GenBank, Hovergen, EMGLib)
CDS .PE protein coding region (all)
RRNA .RR mature ribosomal RNA (all)
TRNA .TR mature transfer RNA (all)
MISC_RNA .RN other structural RNA coding region (EMBL, GenBank, Hovergen, NRSub, EMGLib)
SNRNA .SN small nuclear RNA (EMBL, GenBank, Hovergen, EMGLib)
SCRNA .SC small cytoplasmic RNA (EMBL, GenBank, Hovergen, NRSub, EMGLib)
3'INT .3I 3' intron (Hovergen)
3'NCR .3F 3' non-coding region (Hovergen)
5'INT .5I 5' intron (Hovergen)
5'NCR .5F 5' non-coding region (Hovergen)
CPG .CG CpGobs/CpGexp>0.5 (Hovergen)
INT_INT .IN internal intron (Hovergen)
}
Each entry of a FEATURE TABLE describing a coding region of a DNA fragment gives rise to a subsequence equal to the fragments described in the location of the feature. The type of the resulting subsequence equals the key of the corresponding feature table entry. The name of the resulting subsequence is built by adding to the parent sequence's name an extension uniquely identifying this particular feature.
Sequences of a given type are generally subsequences, i.e., fragments of parent sequences, except if the coding region covers totally the parent sequence, in which case ACNUC does not create a subsequence.
- O = organelle
{sequences from a given organelle.
Organelle (e.g., chloroplast, mitochondrion) denotes the nature of the genome that harbors a particular gene. By extension, ACNUC also sees the nucleus as an organelle. Also, a nuclear-encoded gene coding for a protein exported to an organelle is considered as a nuclear gene. The existing organelles are:}
lll{
CHLOROPLAST Chloroplast genome (EMBL, GenBank, NBRF, Hovergen)
MITOCHONDRION Mitochondrial genome (EMBL, GenBank, NBRF, Hovergen)
KINETOPLAST Kinetoplast genome (EMBL, GenBank, Hovergen)
NUCLEAR Nuclear genome (all)
}
- M = molecule name
{ sequences with given chemical structure.
In ACNUC, molecule denotes the chemical nature of the sequenced molecule (e.g., DNA, mRNA, tRNA).
Molecule should not be confused with type which identifies the encoded molecule (e.g., protein, tRNA, rRNA). Thus the sequence of a tRNA gene has DNA for molecule because DNA rather than tRNA was sequenced. The subsequence covering the tRNA region has tRNA for type because this is the nature of the encoded product. Molecule is defined only for the nucleotide sequence banks (GenBank, EMBL, Hovergen, NRSub, and CGDB). Presently the existing molecules are:}
lll{
DNA Sequenced molecule is DNA (all)
RNA Sequenced molecule is RNA (all)
MRNA Sequenced molecule is mRNA (GenBank, Hovergen)
RRNA Sequenced molecule is rRNA (GenBank, Hovergen)
TRNA Sequenced molecule is tRNA (GenBank, Hovergen)
URNA Sequenced molecule is snRNA (GenBank, Hovergen)
}
- N = sequence name
{ sequence of given name.}
- AC = accession number
{ sequences of given accession number.}
- F = file name
{ (not implemented) sequences whose names are in a specified file.
Use crelistfromclientdata
with type = "SQ"
for this purpose.
}
- FA = file name
{ (not implemented) sequences whose accesion numbers are in a specified file.
Use crelistfromclientdata
with type = "AC"
for this purpose.
}