psel(df, pref, ...)
psel.indices(df, pref, ...)complex_pref and base_pref.
All variables occurring in the definition of pref must be either columns of the data frame df
or variables/functions of the environment where pref was defined.toptop value of k means that the k-best tuples of the data set are returned.
This may be non-deterministic, see below for details.
at_leastat_least value of k returns the top-k tuples and additionally all tuples which are
not dominated by the worst tuple (i.e. the minima) of the Top-k set.
The number of tuples returned is greater or equal than
at_least. In contrast to top-k, this is deterministic.
top_leveltop_level value of k returns all tuples from the k-best levels. See below for the definition of a level.
and_connectedtop, at_least, top_level}
values is given, otherwise it will be ignored.
Then and_connected = TRUE (which is the default) means that all top-conditions
must hold for the returned tuples:
Let cond1 and cond2 be top-conditions like top=2 or top_level=3, then
psel([...], cond1, cond2) is equivalent to the intersection of psel([...], cond1) and psel([...], cond2). If we have
and_connected = FALSE, these conditions are or-connected.
This corresponds to the union of psel([...], cond1) and psel([...], cond2).
show_levelTRUE, a column .level
is added to the returned data frame, containing all level values.
If at least one of the {top, at_least, top_level} values are given,
then show_level is TRUE by default for the psel function.
Otherwise, and for psel.indices in all cases, this option is FALSE by default.top value of k the k best elements and their level values are returned. The level values are determined as follows: psel.indices does not return the level values. By setting show_level = TRUE this function
returns a data frame with the columns '.indices' and '.level'.
Note that, if none of the top-k values {top, at_least, top_level} is set,
then all level values are equal to 1. By definition, a top-k preference selection is non-deterministic.
A top-1 query of two equivalent tuples (equivalence according to pref)
can return on both of these tuples.
For example, a top=1 preference selection on the tuples (a=1, b=1), (a=1, b=2)
w.r.t. low(a) preference can return either the 'b=1' or the 'b=2' tuple. On the contrary, a preference selection using at_least is deterministic by adding all tuples having the same level as the worst level
of the corresponding top-k query. This means, the result is filled with all tuples being not worse than the top-k result.
A preference selection with top-level-k returns all tuples having level k or better. If the top or at_least value is greater than the number of elements in df
(i.e., nrow(df)), or top_level is greater than the highest level in df,
then all elements of df will be returned without further warning.psel it is also possible to perform a preference selection where the maxima are calculated for every group separately.
The groups have to be created with group_by from the dplyr package. The preference selection preserves the grouping, i.e.,
the groups are restored after the preference selection. For example, if the summarize function from dplyr is applied to
psel(group_by(...), pref), the summarizing is done for the set of maxima of each group.
This can be used to e.g., calculate the number of maxima in each group, see the examples below. A {top, at_least, top_level} preference selection
is applied to each group separately.
A top=k selection returns the k best tuples for each group.
Hence if there are 3 groups in df, each containing at least 2 elements,
and we have top = 2, then 6 tuples will be returned.options(rPref.parallel = TRUE) If this option is not set, rPref will use single-threaded computation by default.psel function returns a subset of the data set which are the maxima according to the given preference.
psel.indices returns just the row indices of the maxima
(except top-k queries with show_level = TRUE, see top-k preference selection).
Hence psel(df, pref) is equivalent to df[psel.indices(df, pref),] for non-grouped data frames.
complex_pref on how to construct a Skyline preference.
See plot_front on how to plot the Pareto front of a Skyline.
# Skyline and top-k/at-least skyline
psel(mtcars, low(mpg) * low(hp))
psel(mtcars, low(mpg) * low(hp), top = 5)
psel(mtcars, low(mpg) * low(hp), at_least = 5)
# visualize the skyline in a plot
sky1 <- psel(mtcars, high(mpg) * high(hp))
plot(mtcars$mpg, mtcars$hp)
points(sky1$mpg, sky1$hp, lwd=3)
# grouped preference with dplyr
library(dplyr)
psel(group_by(mtcars, cyl), low(mpg))
# return size of each maxima group
summarise(psel(group_by(mtcars, cyl), low(mpg)), n())
Run the code above in your browser using DataLab