proxy (version 0.1)

pr_DB: Registry of proximities

Description

Registry containing similarities and distances.

Usage

pr_DB
pr_DB$get_field(name)
pr_DB$get_fields()
pr_DB$get_field_names()
pr_DB$set_field(name, default = NA, type = NA, is_mandatory = FALSE,
                is_modifiable = TRUE, validity_FUN = NULL)

pr_DB$entry_exists(name) pr_DB$get_entry(name) pr_DB$get_entries(name = NULL, pattern = NULL) pr_DB$get_entry_names(name) pr_DB$set_entry(...) pr_DB$modify_entry(...) pr_DB$delete_entry(name)

## S3 method for class 'pr_DB': summary(object, verbosity = c("short", "long"), ...)

Arguments

Details

pr_DB represents the registry of all proximity measures available. For each measure, it comprises meta-information that can be queried and extended. Also, new measures can be added. This is done using the following accessor functions of the pr_DB object: get_field_names() returns a character vector with all field names. get_field() returns the information for a specific field as a list with components named as described above. get_fields() returns a list with all field entries. set_field() is used to create new fields in the repository (the default value will be set in all entries).

get_entry_names() returns a character vector with (the first alias of) all entries. entry_exists() is a predicate checking if an entry with the specified alias exists in the registry. get_entry() returns the specified entry if it exists (and, by default, gives an error if it does not). get_entries() is used to query more than one entry: either those matching name exactly, or those where the regular expression in pattern matches any character field in an entry. By default, all values are returned. delete_entry removes an existing entry from the registry (note that only user-provided entries can be deleted). set_entry and modify_entry require a named list of arguments used as field entries. At least the names index field is required. set_entry will check for all other mandatory fields. If specified in the field meta data, each field entry and the entry as a whole is checked for validity. Note that only user-specified fields and/or entries can be modified, the data shipped with the package are read-only.

The registry fields currently available are as follows: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A function specified as FUN parameter has mandatory arguments x and y (if abcd is FALSE), and a, b, c, d, n otherwise. Additionally, it gets all optional parameters specified by the user in the ... argument of the dist and simil functions, possibly changed and/or complemented by the corresponding (optional) PREFUN function. It must return the (diss-)similarity value computed from the arguments. x and y are two vectors from the data matrix (matrices) supplied. If abcd is FALSE, it is assumed that binary measures will be used, and the number of all n concordant and discordant pairs (x_k, y_k) precomputed and supplied instead of x and y. a, b, c, and d are the counts of all (TRUE, TRUE), (TRUE, FALSE), (FALSE, TRUE), and (FALSE, FALSE) pairs, respectively.

A function specified as PREFUN parameter has mandatory arguments x, y, p, and reg_entry, with y and p possibly being NULL depending on the task at hand. x and y are the data objects, p is a (possibly empty) list with all specified proximity parameters, and reg_entry is the registry entry (a named list containing all information specified in reg_add). The preprocessing function is allowed to change all these information, and if so, is required to return *all* arguments as a named list in the same order.

A function specified as POSTFUN parameter has two mandatory arguments: result and p. result will contain the computed raw data, i.e. a vector of length $n * (n - 1) / 2$ for auto-distances (see dist for details on dist objects), or a matrix for cross-distances. p contains the specified proximity parameters. Post-processing functions need to return the result object (even if unmodified).

A function specified as convert parameter should preserve the type of its argument.

See Also

dist

Examples

Run this code
## create a new distance measure
mydist <- function(x,y) x * y

## create a new entry in the registry with two aliases
pr_DB$set_entry(FUN = mydist, names = c("test", "mydist"))

## look it up
pr_DB$get_entry("test")

## modify the content of the description field in the new entry
pr_DB$modify_entry(names = "test", description = "foo function")

## create a new field
pr_DB$set_field("New")

## look up the test entry again (two ways)
pr_DB$get_entry("test")
pr_DB[["test"]]

## show total number of entries
length(pr_DB)

## show all entries (short list)
pr_DB$get_entries(pattern = "foo")

## show more details
summary(pr_DB, "long")

## get all entries in a list (and extract first two ones)
pr_DB$get_entries()[1:2]

## get all entries as a data frame (select first 3 fields)
as.data.frame(pr_DB)[,1:3]

## delete test entry
pr_DB$delete_entry("test")

## check if it is really gone
pr_DB$entry_exists("test")

Run the code above in your browser using DataLab