VEPParam
are organized into the following categories,
basic, input, cache, output,
identifier, colocatedVariants, dataformat,
filterqc, database and advanced. Each category
is a list
of runtime options. logical
options are turned
on/off with TRUE/FALSE. character
and numeric
are
on when a character string is provided and off when
they contain an empty value (i.e., character()
or numeric()
. identifier, colocatedVariants, dataformat
are supported for VEPParam73 and later.
basic
list
of the following options:
logical
, default FALSE; output status messages
logical
, default FALSE; suppress status/warnings
logical
, default FALSE; don't show progress
bars
character
, default character()
; name of
config file
logical
, default FALSE; shortcut to switch
on 12 options (sift, polyphen, ccds, hgvs, hgnc, numbers, domains,
regulatory, cell_type, canonical, protein and gmaf).
numeric
, default numeric()
; enable forking
input
list
of the the following options:
character
, default 'homo_sapiens';
species for the data
character
, default character()
;
select assembly version if more than one available
character
, default character()
;
one of the following input file formats, 'ensembl', 'vcf',
'pileup', 'hgvs', 'id' or 'vep'. By default the script
auto-detects the input file format.
character
, default writes to temp file;
path and file name of output file
logical
, default FALSE; overwrite
the output file if it currently exists
character
, default character()
;
summary stats file name
logical
, default FALSE; do not generate
a stats file
logical
, default FALSE; generate a plain
text stats file instead of html
logical
, default FALSE; generate html version
of the output file
cache
list
of the following options:
logical
, default FALSE; enable use of cache
character
, default '$HOME/.vep/'; cache/plugin
to be used
character
, default '$HOME/.vep/'; cache
to be used
character
, default '$HOME/.vep/'; plugin
to be used
logical
, default FALSE; enable offline mode,
no database connections will be made
character
, default character()
; FASTA
filename or directory to files to use for reference sequences
character
, default character()
;
use a different cache version than the assumed default
logical
, default FALSE;
show source version information for selected cache and quit
output
list
of the following options:
logical
, default FALSE;
output the sequence ontology variant class
character
, default character()
;
output prediction, score
or both, valid strings are 'p', 's' or 'b'
character
, default character()
;
output prediction,
score or both, valid strings are 'p', 's' or 'b'
logical
, default FALSE;
retrieve the humDiv PolyPhen prediction instead of humVar
logical
, default FALSE;
indicates if overlapped gene is associated with a phenotype, disease
or trait
logical
, default FALSE; identify overlaps
with regulatory regions
character
, default character()
;
only report
regulatory regions found in the given cell type(s)
character
, default character()
; name of
custom annotation file to add to output. Currently only a single
annotation is supported.
character
, default character()
; name of
plugin module. Currently only a single module is supported.
character
, default character()
;
consider only alternate alleles present in the genotypes of
'all' or a character vector of specified individuals
logical
, default FALSE; force VCF genotypes
to be interpreted as phased
logical
, default FALSE; identify allele
number from VCF input (1=first ALT, 2=second ALT, etc.)
character
, default character()
;
cDNA, CDS and protein positions as position/length
logical
, default FALSE; output affectd exon and
intron numbering, format is Number/Total
logical
, default FALSE; output names of
overlapping protein domains
logical
, default FALSE;
don't URI escape HGVS string
logical
, default FALSE;
don't overwrite existing CSQ entry in VCF INFO field
character
, default CSQ;
change the name of the INFO key that VEP writes the consequences to
in the VCF output.
character
, default 'so'; type
of consequence terms to output, valid strings are 'ensembl' or 'so'
identifiers
list
of the following options:
logical
, default FALSE; add hgvs ID's
[0/1]
, default 1 (shift);
enable or disable 3' shifting of HGVS notations
logical
, default FALSE; add Ensembl protein ID's
logical
, default FALSE; add gene symbol
(e.g. HGNC) (where available) to the output
logical
, default FALSE; add CCDS transcript ID's
logical
, default FALSE;
adds identifiers for translated protein products from three
UniProt-related databases
logical
, default FALSE;
adds the transcript support level for this transcript
logical
, default FALSE;
indicate if transcript is cononical transcript for the gene
logical
, default FALSE; add biotype of
transcript
logical
, default FALSE; output aligned refseq
mRNA ID
colocatedVariants
list
of the following options:
logical
, default FALSE; check for
co-located variants
logical
, default FALSE; when checking for
co-located variants only report them if none of the alleles
supplied are novel
logical
, default FALSE; check for
structural variants that overlap the input variants
logical
, default FALSE; add global minor allele
frequence (MAF) from 1000 Genomes Phase 1 data
logical
, default FALSE; add MAF from
continental populations of 1000 Genomes Phase 1 data;
must be use with --cache
logical
, default FALSE; add MAF from
NHLBI-ESP populations; must be used with --cache
logical
, default FALSE;
for maf_1kg and maf_esp report only the frequency (no allele) and
convert so it is always a minor frequency, i.e. < 0.5
logical
, default FALSE;
report Pubmed IDs for publications that cite existing variant;
must be used with --cache
logical
, default FALSE; when checking for
co-located variants include or exclude variants that have been
flagged as failed
dataformat
list
of the following options:
logical
, default FALSE; write output in vcf format
logical
, default FALSE; write output in json format
logical
, default FALSE; write output in gcf format
character
, default fields are
'Uploaded_variation', 'Location', 'Allele', 'Gene', 'Feature',
'Feature_type', 'Consequence', 'cDNA_position', 'CDS_position',
'Protein_position', 'Amino_acids', 'Codons' and 'Extra'. See
http://www.ensembl.org/info/docs/variation/vep/vep_formats.html#sv
for details.
character
, default character()
;
converts input file to one of 'ensembl', 'vcf', or 'pileup'
logical
, default FALSE; convert alleles to
their most minimal representation before consequence calculation
filterqc
list
of the following options:
logical
, default FALSE; force check of
supplied reference allele against the sequence stored in Ensembl
Core database
logical
, default FALSE; return
consequences in coding regions only
character
, default character()
; select
a subset of chromosomes to be analyzed
logical
, default FALSE; do not
include intergenic consequences
logical
, default FALSE;
pick once line of consequence data per variant
logical
, default FALSE;
pick once line of consequence data per variant allele
logical
, default FALSE;
as per --pick, but adds the PICK flag to the chosen block of
consequence data and retains others.
logical
, default FALSE;
as per --pick_allele, but adds the PICK flag to the chosen block
of consequence data and retains others.
logical
, default FALSE;
output only the most severe consequence per gene
character
, See ensembl web page for
default order; customise the order of criteria applied when
choosing a block of annotation data with e.g. --pick.
logical
, default FALSE; output only most
severe consequence per variation
logical
, default FALSE; output a comma-separated
list of all observed consequences per variation, transcript-specific
columns will be left blank
logical
, default FALSE; shortcut flag
to turn on filters, See web page for details.
logical
, default FALSE; turn on
frequency filtering, must also specify all of the
--freq\_* flags. See web page for details.
character
, default character()
;
population to use in frequency filter
numeric
, default numeric()
;
MAF to use in frequency filter
character
, default character()
;
specify whether the frequency of the co-located variant must
be greater than or less than the value specified. Values
are 'gt' or 'lt'.
in the freq_freq
option.
character
, default character()
;
specify whether to exclude or include variants that pass
the frequency filter. Values are 'exclude' or 'include'.
logical
, default FALSE; when using
VCF format as input and output, by default VEP will skip all
non-variant lines of input (i.e., where the ALT is NULL). When
this option is enabled, lines will be printed in the VCF output
with no consequence data added.
database
list
of the following options:
logical
, default TRUE; enable the VEP to
use local or remote databases
character
, default 'useast.ensembl.db.org';
database host
character
default character()
;
database user
character
, default character()
;
database password
numeric
, default character()
;
database port
logical
, default FALSE; override default
connection settings with those for the Ensembl Genomces public
MySQL server
logical
, default FALSE;
limit analysis to transcripts in GENCODE basic set
logical
, default FALSE; use otherfeatures
database to retrieve transcripts
logical
, default FALSE;
use the merged Ensembl and RefSeq cache
logical
, default FALSE;
include e.g. CCDS and Ensembl EST transcripts
logical
, default FALSE;
map input variants to LRG coordinates
numeric
, default character()
;
force connection to specific version
character
, default character()
;
provide file to override default connection settings
advanced
list
of the following options:
logical
, default FALSE; run in
non-whole genome mode, variants analyzed one at a time, no caching
numeric
, default 5000; internal buffer
size corresponding to number of variations read into memory
simultaneously
logical
, default FALSE; enable writing
to the cache
character
, default character()
; build
cache for the selected species from the database (See --chr flag)
character
, default character()
;
specify utility to decompress cached files (zcat is default)
logical
, default FALSE; force the script
to use a cache built from a different host than specified with
--host
numeric
, default numeric()
;
size in base-pairs of the region covered by one file in the cache,
see full description of this flag on the web site for details
VEParam
objects store the runtime options for querying the Ensembl
Variant Effect Predictor (VEP). This page describes only the most current
runtime options and is a condensed version of what is listed on the
Ensembl web site:http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html
Runtime options for archived versions can be found on the corresponding archive page.
ensemblVEP
function man page.
VEPParam
class man page.
## See ?VEPParam for examples of constructing instances of a
## VEPParam object with different runtime options.
Run the code above in your browser using DataLab