Learn R Programming

TopKLists (version 1.0.2)

calculate.maxK: The main function for TopKInference

Description

Returns a complex object named truncated.lists containing the Idata vector (see prepare.idata), the estimated truncation index $j_0=k+1$ (see compute.stream) for each pair of input lists, the overall top-k estimate (see j0.multi), and other objects with necessary plotting information for the aggmap

Usage

calculate.maxK(lists, L, d, v, threshold)

Arguments

lists
Data frame containing two or more columns that represent input lists of ordered objects subject to comparison
L
Number of input lists that are compared
d
The maximal distance delta between object ranks required for the estimation of $j_0$
v
The pilot sample size (tuning parameter) $\nu$ required for the estimation of $j_0$
threshold
The percentage of occurencies of an object in the top-k selection among all comparisons in order to be gray-shaded in the aggmap as a consolidated object

Value

  • A named list of the following content:
  • comparedListsContains information about the overlap of all pairwise compared lists (structure for the aggmap)
  • infoContains information about the list names
  • grayshadedListsContains information which objects in a list are consolidated (gray-shaded in the aggmap)
  • summarytableTable of top-k list overlaps containing rank information, the rank sum, the order of objects as a function of the rank sum, the frequency of an object in the input lists and the frequency of an object in the truncated lists (for plotting in the aggmap)
  • vennlistsContains the top-k objects for each of the input lists (for display in the Venn-diagram)
  • venntableContains the overlap information (for display in the Venn-table)
  • vSelected pilot sample size (tuning parameter) $\nu$
  • NtoplotNumber of columns to be plotted in the aggmap
  • IdataData frame of Idata vectors (see compute.stream) for each pair of input lists and the associated delta's
  • dselected delta
  • thresholdselected threshold
  • thresholdnumber of lists
  • Nnumber of items in data frame (lists)
  • listsdata frame of lists that entered the analysis
  • maxKmaximal estimate of the top-k's (for all pairwise comparisons)
  • topkspacethe final integrated list of objects as result of the CEMC algorithm applied to the maxK truncated lists

References

Hall, P. and Schimek, M. G. (2012). Moderate deviation-based inference for random degeneration in paired rank lists. J. Amer. Statist. Assoc., 107, 661-672.

See Also

CEMC, prepare.idata

Examples

Run this code
set.seed(1234)
data(TopKGUISampleInput)
truncated.lists = calculate.maxK(lists, d=10, v=10, L=3, threshold=50)
aggmap(truncated.lists, N=80, L=3, d=10, v=10, lists=lists, threshold=50)

Run the code above in your browser using DataLab