calculate.maxK: The main function for TopKInference

Description

Returns a complex object named truncated.lists containing the Idata vector (see prepare.idata), the estimated truncation index \(j_0=k+1\) (see compute.stream) for each pair of input lists, the overall top-k estimate (see j0.multi), and other objects with necessary plotting information for the aggmap

Usage

calculate.maxK(lists, L, d, v, threshold)

Value

A named list of the following content:

comparedLists: Contains information about the overlap of all pairwise compared lists (structure for the aggmap)
info: Contains information about the list names
grayshadedLists: Contains information which objects in a list are consolidated (gray-shaded in the aggmap)
summarytable: Table of top-k list overlaps containing rank information, the rank sum, the order of objects as a function of the rank sum, the frequency of an object in the input lists and the frequency of an object in the truncated lists (for plotting in the aggmap)
vennlists: Contains the top-k objects for each of the input lists (for display in the Venn-diagram)
venntable: Contains the overlap information (for display in the Venn-table)
v: Selected pilot sample size (tuning parameter) \(\nu\)
Ntoplot: Number of columns to be plotted in the aggmap
Idata: Data frame of Idata vectors (see compute.stream) for each pair of input lists and the associated delta's
d: selected delta
threshold: selected threshold
threshold: number of lists
N: number of items in data frame (lists)
lists: data frame of lists that entered the analysis
maxK: maximal estimate of the top-k's (for all pairwise comparisons)
topkspace: the final integrated list of objects as result of the CEMC algorithm applied to the maxK truncated lists

Arguments

lists: Data frame containing two or more columns that represent input lists of ordered objects subject to comparison
L: Number of input lists that are compared
d: The maximal distance delta between object ranks required for the estimation of \(j_0\)
v: The pilot sample size (tuning parameter) \(\nu\) required for the estimation of \(j_0\)
threshold: The percentage of occurencies of an object in the top-k selection among all comparisons in order to be gray-shaded in the aggmap as a consolidated object

Author

Eva Budinska <budinska@iba.muni.cz>, Michael G. Schimek <michael.schimek@medunigraz.at>

References

Hall, P. and Schimek, M. G. (2012). Moderate deviation-based inference for random degeneration in paired rank lists. J. Amer. Statist. Assoc., 107, 661-672.

Examples

Run this code

set.seed(1234)
data(breast)
truncated.lists = calculate.maxK(breast, d=6, v=10, L=3, threshold=50)
if (FALSE) {
aggmap(truncated.lists)
}

Run the code above in your browser using DataLab