base_pref: Base Preferences

Description

Base preferences are used to describe the different goals (dimensions, in case of a Skyline query) of a preference query.

Usage

low(expr, df = NULL)
low_(expr, df = NULL)
high(expr, df = NULL)
high_(expr, df = NULL)
true(expr, df = NULL)
true_(expr, df = NULL)
is.base_pref(x)

Arguments

expr: A numerical/logical expression which is the term to evaluate for the current preference. The objective is to search for minimal/maximal values of this expression (for low/high) or for logical TRUE values (for true). For low_, high_ and true_, the argument must be an expression, a call or a string.
df: (optional) A data frame, having the same structure (i.e., columns) like that data frame, where this preference is evaluated later on. Causes a partial evaluation of the preference and the preference is associated with this data frame. See below for details.
x: An object to be tested if it is a base preference.

Using Expressions in Preferences

The low_, high_ and true_ preferences have the same functionality as low, high and true but expect an expression, a call or a string as argument. For example, low(a) is equivalent to low_(expression(a)) or low_("a"). Lazy expressions (see the lazyeval package) are also possible.

This is helpful for developing your own base preferences. Assume you want to define a base Preference false as the dual of true. A definition like false <- function(x) -true(x) is the wrong approach, as psel(data.frame(a = c(1,2)), false(a == 1)) will result in the error "object 'a' not found". This is because a is considered as a variable and not as an (abstract) symbol to be evaluated later. By defining

false <- function(x, ...) -true_(substitute(x), ...)

one gets a preference which behaves like a "built-in" preference. Additional optional parameters (like df) are bypassed. The object false(a == 1) will output [Preference] -true(a == 1) on the console and psel(data.frame(a = c(1,2)), false(a==1)) returns correctly the second tuple with a==2.

There is a special symbol df__ which can be used in preference expression to access the given data set df, when psel is called on this data set. For example, on a data set where the first column has the name A the preference low(df__[[1]]) is equivalent to low(A).

Partial Evaluation and Associated Data Frames

If the optional parameter df is given, then the expression is evaluated at the time of definition as far as possible. All variables occurring as columns in df remain untouched. For example, consider

f <- function(x) 2*x
p <- true(cyl == f(1), mtcars)

Then p is equivalent to the preference true(cyl == 2) as the variable cyl is a column in mtcars. Additionally the data set mtcars is associated with the preference p, implying that the preference selection can be done with peval. See assoc.df for details on associated data sets.

The preference selection, i.e., psel(mtcars, p) can be invoked without the partial evaluation. But this results in an error, if the function f has meanwhile removed from the current environment. Hence it is safer to do an early partial evaluation of all preferences, as far as they contain user defined functions.

The partial evaluation can be done manually by partial.eval.pref.

Details

Mathematically, all base preferences are strict weak orders (irreflexive, transitive and negative transitive).

The three fundamental base preferences are:

low(a), high(a): Search for minimal/maximal values of a, i.e., the induced order is the "smaller than" or "greater than" order on the values of a. The values of a must be numeric values.
true(a): Search for true values in logical expressions, i.e., TRUE is considered to be better than FALSE. The values of a must be logical values. For a tuplewise evaluation of a complex logical expression one has to use the & and | operators for logical AND/OR (and not the && and || operators).

The term expr may be just a single attribute or may contain an arbitrary expression, depending on more than one attribute, e.g., low(a+2*b+f(c)). There a, b and c are columns of the addressed data set and f has to be a previously defined function.

Functions contained in expr are evaluated over the entire data set, i.e., it is possible to use aggregate functions (min, mean, etc.). Note that all functions (and also variables which are not columns of the data set, where expr will be evaluated on) must be defined in the same environment (e.g., environment of a function or global environment) as the base preference is defined.

The function is.base_pref returns TRUE if x is a preference object and FALSE otherwise.

Examples

Run this code

# define a preference with a score value combining mpg and hp
p1 <- high(4 * mpg + hp)
# perform the preference selection
psel(mtcars, p1)

# define a preference with a given function
f <- function(x, y) (abs(x - mean(x))/max(x) + abs(y - mean(y))/max(y))
p2 <- low(f(mpg, hp))
psel(mtcars, p2)

# use partial evaluation for weighted scoring
p3 <- high(mpg/sum(mtcars$mpg) + hp/sum(mtcars$hp), df = mtcars)
p3
# select Pareto optima
peval(p3)

Run the code above in your browser using DataLab