expectedUtility: Expected utility of an ID mapping, ID filtering, or other bioinformatics data preparation method

Description

expectedUtility calculates mean expected utility and total expected utility across pairs of features from two bioinformatics platforms. It is used to evaluate an ID mapping, ID filtering, or other bioinformatics data preparation method.

Usage

expectedUtility(dataset, label = "", bootModelCorClusters, columnsToRemove = c("Utp", "Lfp", "deltaPlus", "pi1Hat"), Utp, Lfp, deltaPlus, guarantee = 1e-09)

Arguments

dataset

A data frame or list from a call to fit2clusters, the posterior probabilities for each observation, their variance estimates. See Details.

label

A text string describing the method being studied, to label the return value. This is handy for using rbind to combine results for different methods.

bootModelCorClusters

Source for mixture model estimates. If missing, extracted from calling frame.

columnsToRemove

Names of columns to remove from return value.

Utp

Utility of a true positive.

Lfp

Loss of a false positive.

deltaPlus

Parameter defined as Pr("+" | "+" or "0")

guarantee

Minimum value for posterior probability.

Value

Utp: Utility of a true positive.
Lfp: Loss of a false positive.
deltaPlus: Parameter defined as Pr("+" | "+" or "0")
deltaZero: Parameter defined as Pr("0" | "0" or "x")
nPairs: Number of ID pairs selected by the method.
pi1Hat: The estimate of the probability of the high-correlation component; obtained from
PrPlus: Estimated probability that an ID pair is in the "+" group.
PrTrue: Estimated probability that an ID pair is in the "+" or "0" group: PrPlus/deltaPlus
PrFalse: Estimated probability that an ID pair is in the "-" group.
Utrue: The component of expected utility from "true positives": PrTrue * Utp.
Lfalse: The (negative) component of expected utility from "false positives": PrFalse * Lfp.
Eutility1: The average expected utility per ID pair: Utrue-Lfalse.
Eutility: The total expected utility, summing over ID pairs: nrow(dataset)*Eutility1.

Details

The input dataset should be a dataframe with one row per ID pair, and the following columns:

Utp Utility of a true positive.
Lfp Loss of a false positive.
postProb The posterior probabilities for each observation
postProbVar The variances of the posterior probabilities, usually estimated from the bootstrap using Boot