Learn R Programming

IdMappingAnalysis (version 1.16.0)

expectedUtility: Expected utility of an ID mapping, ID filtering, or other bioinformatics data preparation method

Description

expectedUtility calculates mean expected utility and total expected utility across pairs of features from two bioinformatics platforms. It is used to evaluate an ID mapping, ID filtering, or other bioinformatics data preparation method.

Usage

expectedUtility(dataset, label = "", bootModelCorClusters, columnsToRemove = c("Utp", "Lfp", "deltaPlus", "pi1Hat"), Utp, Lfp, deltaPlus, guarantee = 1e-09)

Arguments

dataset
A data frame or list from a call to fit2clusters, the posterior probabilities for each observation, their variance estimates. See Details.
label
A text string describing the method being studied, to label the return value. This is handy for using rbind to combine results for different methods.
bootModelCorClusters
Source for mixture model estimates. If missing, extracted from calling frame.
columnsToRemove
Names of columns to remove from return value.
Utp
Utility of a true positive.
Lfp
Loss of a false positive.
deltaPlus
Parameter defined as Pr("+" | "+" or "0")
guarantee
Minimum value for posterior probability.

Value

A data frame with just one row. The columns are:
Utp
Utility of a true positive.
Lfp
Loss of a false positive.
deltaPlus
Parameter defined as Pr("+" | "+" or "0")
deltaZero
Parameter defined as Pr("0" | "0" or "x")
nPairs
Number of ID pairs selected by the method.
pi1Hat
The estimate of the probability of the high-correlation component; obtained from
PrPlus
Estimated probability that an ID pair is in the "+" group.
PrTrue
Estimated probability that an ID pair is in the "+" or "0" group: PrPlus/deltaPlus
PrFalse
Estimated probability that an ID pair is in the "-" group.
Utrue
The component of expected utility from "true positives": PrTrue * Utp.
Lfalse
The (negative) component of expected utility from "false positives": PrFalse * Lfp.
Eutility1
The average expected utility per ID pair: Utrue-Lfalse.
Eutility
The total expected utility, summing over ID pairs: nrow(dataset)*Eutility1.

Details

The input dataset should be a dataframe with one row per ID pair, and the following columns:
  • Utp Utility of a true positive.
  • Lfp Loss of a false positive.
  • postProb The posterior probabilities for each observation
  • postProbVar The variances of the posterior probabilities, usually estimated from the bootstrap using Boot