Learn R Programming

rfPred (version 1.10.0)

rfPred_scores: Assign functional prediction rfPred scores to human missense variants

Description

rfPred is a statistical method which combines 5 algorithms predictions in a random forest model: SIFT, Polyphen2, LRT, PhyloP and MutationTaster. These scores are available in the dbNFSP database for all the possible missense variants in hg19 version, and the package rfPred gives a composite score more reliable than each of the isolated algorithms.

Arguments

variant_list
A variants list in a data.frame containing 4 or 5 columns: chromosome number, hg19 genomic position on the chromosome, reference nucleotid, variant nucleotid and uniprot protein identifier (optional); or a character string of the path to a VCF (Variant Call Format) file; or a GRanges object with metadata containing textually reference, alteration and proteine (optional) columns names for reference and alteration
data
Path to the compressed TabixFile, either on the server (default) or on the user's computer
index
Path to the index of the TabixFile, either on the server (default) or on the user's computer
all.col
TRUE to return all available information, FALSE to return a more compact result (the most informative columns, see Value)
file.export
Optional, name of the CSV file in which export the results (default is NULL)
n.cores
number of cores to use when scaning the TabixFile, can be efficient for large request (default is 1)

Value

The variants list with the assigned rfPred scores, as well as the scores used to build rfPred meta-score: SIFT, phyloP, MutationTaster, LRT (transformed) and Polyphen2 (corresponding to Polyphen2_HVAR_score). The data frame returned contains these columns:
chromosome
chromosome number
position_hg19
physical position on the chromosome as to hg19 (1-based coordinate)
reference
reference nucleotide allele (as on the + strand)
alteration
alternative nucleotide allele (as on the + strand)
proteine
Uniprot accession number
aaref
reference amino acid
aaalt
alternative amino acid
aapos
amino acid position as to the protein
rfPred_score
rfPred score betwen 0 and 1 (higher it is, higher is the probability of pathogenicity)
SIFT_score
SIFT score between 0 and 1 (higher it is, higher is the probability of pathogenicity contrary to the original SIFT score) = 1-original SIFT score
Polyphen2_score
Polyphen2 (HVAR one) score between 0 and 1, used to calculate rfPred (higher it is, higher is the probability of pathogenicity)
MutationTaster_score
MutationTaster score between 0 and 1 (higher it is, higher is the probability of pathogenicity)
PhyloP_score
PhyloP score between 0 and 1 (higher it is, higher is the probability of pathogenicity): PhyloP_score=1-0.5x10^phyloP if phyloP>0 or PhyloP_score=0.5x10^-phyloP if phyloP<0< dd="">
LRT_score
LRT score between 0 and 1 (higher it is, higher is the probability of pathogenicity): LRT_score=1-LRToriginalx0.5 if LRT_Omega<1 or="" lrt_score="LRToriginalx0.5" if="" lrt_omega="">=1
The following columns are also returned if all.col is TRUE:
Uniprot_id
Uniprot ID number
genename
gene name
position_hg18
physical position on the chromosome as to hg18 (1-based coordinate)
Polyphen2_HDIV_score
Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1: the corresponding prediction is "probably damaging" if it is in [0.957,1]; "possibly damaging" if it is in [0.453,0.956]; "benign" if it is in [0,0.452]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";"
Polyphen2_HDIV_pred
Polyphen2 prediction based on HumDiv: D (probably damaging), P (possibly damaging) and B (benign). Multiple entries separated by ";"
Polyphen2_HVAR_score
Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1, and the corresponding prediction is "probably damaging" if it is in [0.909,1]; "possibly damaging" if it is in [0.447,0.908]; "benign" if it is in [0,0.446]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";"
Polyphen2_HVAR_pred
Polyphen2 prediction based on HumVar: D (probably damaging), P (possibly damaging) and B (benign). Multiple entries separated by ";"
MutationTaster_pred
MutationTaster prediction: A (disease_causing_automatic), D (disease_causing), N (polymorphism) or P (polymorphism_automatic)
phyloP
original phyloP score
LRT_Omega
estimated nonsynonymous-to-synonymous-rate ratio
LRT_pred
LRT prediction, D(eleterious), N(eutral) or U(nknown)

References

Jabot-Hanin F, Varet H, Tores F and Jais J-P. 2013. rfPred: a new meta-score for functional prediction of missense variants in human exome (submitted).

Examples

Run this code
# from a data.frame without uniprot protein identifier
data(variant_list_Y)
res=rfPred_scores(variant_list = variant_list_Y[,1:4],
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a data.frame with uniprot protein identifier
res2=rfPred_scores(variant_list = variant_list_Y,
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a VCF file
res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE),
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a GRanges object
data(example_GRanges)
res4=rfPred_scores(variant_list = example_GRanges,
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))

Run the code above in your browser using DataLab