RUMIcurve: Information accretion based predictor assessment (across many thresholds)

Description

Reads in a (tab-delimited) file containing the true annotations for a set of sequences, a (tab-delimited) file containing the predicted annotations and corresponding scores for the same sequences. Calculates and outputs the average remaining uncertainty, misinformation, and semantic similarity at a series of user-specified thresholds.

Usage

RUMIcurve(ont, organism, increment = 0.05, truefile, predfiles,  IAccr = NULL, add.weighted = FALSE,  add.prec.rec = FALSE)

Arguments

ont

Character representation of ontology version to use. One of "CC", "MF", or "BP" , corresponding to Cellular Component, Molecular Function, and Biological Process.

organism

A character vector indicating which organism(s) annotation data to use.

increment

A numeric value between 0 and 1 indicating the distance between each threshold that should be calculated. Note that the iteration starts from a threshold of 1, so an increment value of 0.08 will result in the thresholds 0.92, 0.84, 0.76 ... being used.

truefile

A character vector indicating the file from which to read the true annotations for the given sequences. Should be tab-delimited, with the first column containing the sequence ids and the second containing GO accessions.

predfiles

A character vector containing which files to read in as the predicted annotations. Should be tab-delimited, with the first column containing sequences, the second column containing GO accessions, and the third column containing the predictors 0-1 score for that prediction.

IAccr

A variable containing a named numeric vector of IA values for all the GO terms being used that will be used for calculations instead of R packages. This argument is optional.

add.weighted

A boolean indicating whether or not to add calculation of information content weighted versions of RU, MI, and SS to the output.

add.prec.rec

A boolean indicating whether or not to calculate precision, recall and specificity values for the prediction at each threshold and add to the output.

Value

Returns a named list with the same number of elements as the input "predfiles". Each element is a data frame containing all of the user-requested values for the data at each threshold.

Examples

Run this code

# Using test data sets from SemDist, plot a RUMI curve:
truefile <- system.file("extdata", "MFO_LABELS_TEST.txt", package="SemDist")
predfile <- system.file("extdata", "MFO_PREDS_TEST.txt", package="SemDist")
avgRUMIvals <- RUMIcurve("MF", "human", 0.05, truefile, predfile)
firstset <- avgRUMIvals[[1]]
plot(firstset$RU, firstset$MI)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples