Learn R Programming

SemDist (version 1.2.0)

RUMIcurve: Information accretion based predictor assessment (across many thresholds)

Description

Reads in a (tab-delimited) file containing the true annotations for a set of sequences, a (tab-delimited) file containing the predicted annotations and corresponding scores for the same sequences. Calculates and outputs the average remaining uncertainty, misinformation, and semantic similarity at a series of user-specified thresholds.

Usage

RUMIcurve(ont, organism, increment = 0.05, truefile, predfiles, IAccr = NULL, add.weighted = FALSE, add.prec.rec = FALSE)

Arguments

ont
Character representation of ontology version to use. One of "CC", "MF", or "BP" , corresponding to Cellular Component, Molecular Function, and Biological Process.
organism
A character vector indicating which organism(s) annotation data to use.
increment
A numeric value between 0 and 1 indicating the distance between each threshold that should be calculated. Note that the iteration starts from a threshold of 1, so an increment value of 0.08 will result in the thresholds 0.92, 0.84, 0.76 ... being used.
truefile
A character vector indicating the file from which to read the true annotations for the given sequences. Should be tab-delimited, with the first column containing the sequence ids and the second containing GO accessions.
predfiles
A character vector containing which files to read in as the predicted annotations. Should be tab-delimited, with the first column containing sequences, the second column containing GO accessions, and the third column containing the predictors 0-1 score for that prediction.
IAccr
A variable containing a named numeric vector of IA values for all the GO terms being used that will be used for calculations instead of R packages. This argument is optional.
add.weighted
A boolean indicating whether or not to add calculation of information content weighted versions of RU, MI, and SS to the output.
add.prec.rec
A boolean indicating whether or not to calculate precision, recall and specificity values for the prediction at each threshold and add to the output.

Value

Returns a named list with the same number of elements as the input "predfiles". Each element is a data frame containing all of the user-requested values for the data at each threshold.

See Also

computeIA findRUMI

Examples

Run this code
# Using test data sets from SemDist, plot a RUMI curve:
truefile <- system.file("extdata", "MFO_LABELS_TEST.txt", package="SemDist")
predfile <- system.file("extdata", "MFO_PREDS_TEST.txt", package="SemDist")
avgRUMIvals <- RUMIcurve("MF", "human", 0.05, truefile, predfile)
firstset <- avgRUMIvals[[1]]
plot(firstset$RU, firstset$MI)

Run the code above in your browser using DataLab