psel: Preference selection

Description

Evaluate a preference on a given dataset, i.e. return the maximal elements of a dataset for a given preference order.

Usage

psel(df, pref, top = NULL)

psel.indices(df, pref, top = NULL)

Arguments

A dataframe or, for grouped preference selection, a grouped dataframe. See below for details.

pref

The preference order, constructed via complex_pref and base_pref. All variables occuring in pref must be columns of the dataframe

top

By default NULL, which means that the maxima are returned. For top = k the k-best elements according to the preference are returned.

Grouped preference selection

With psel it is also possible to perform a preference selection, where the maxima are calculated for every group seperatly. The groups have to be created with group_by from the dplyr package. The preference selection preserves the grouping, i.e., the summarize function from dplyr refers to the set of maxima of each group. This can be used to e.g. calculate the number of maxima in each group, see examples below. A given top value k in connection with a grouped preference selection returns the k best values for each group. Hence if there are three groups in df, each containing at least 2 elements, and we have top = 2 then 6 tuples will be returned.

Details

The difference between the two variants of the preference selection is:

Thepselfunction returns a subset of the dataset which are the maxima according to the given preference.
The functionpsel.indicesreturns just the row indices of the maxima. Hencepsel(df,pref,top)is equivalent todf[psel(df,pref,top),]for non-grouped dataframes. For grouped dataframes, the groups are restored after the preference selection.

For a given top value "k", the k best elements are returned. By definition, this is non-deterministic. A top-1 query of two equivalent tuples (according to pref) can return on both of these tuples. In the rPref implementation, in this case the first occuring tuple in the dataset is picked. If the top value is greater than the number of elements in df, i.e., top > nrow(df) then all elements of df will be returned without further warning.

Examples

Run this code

# Skyline and Top-K skyline
psel(mtcars, low(mpg) * low(hp))
psel(mtcars, low(mpg) * low(hp), top = 5)

# Visualize the skyline in a plot
sky1 <- psel(mtcars, high(mpg) * high(hp))
plot(mtcars$mpg, mtcars$hp)
points(sky1$mpg, sky1$hp, lwd=3)

# Grouped preference with dplyr
library(dplyr)
psel(group_by(mtcars, cyl), low(mpg))

# Return size of each maxima group
summarise(psel(group_by(mtcars, cyl), low(mpg)), n())