rgp (version 0.4-1)

commonSubexpressions: Similarity and Distance Measures for R Functions and Expressions

Description

These functions implement several similarity and distance measures for R functions (i.e. their body expressions). TODO check and document measure-theoretic properties of each measure defined here TODO these distance measures are metrics, some of them are norm-induced metrics commonSubexpressions returns the set of common subexpressions of expr1 and expr2. This is not a metric by itself, but can be used to implement several subtree-based similarity metrics. of expr1 and expr2. sizeWeightedNumberOfcommonSubexpressions returns the number of common subexpressions of expr1 and expr2, weighting the size of each common subexpression. Note that for every expression e, sizeWeightedNumberOfcommonSubexpressions( e , e ) == exprVisitationLength( e ). normalizedNumberOfCommonSubexpressions returns the ratio of the number of common subexpressions of expr1 and expr2 in relation to the number of subexpression in the larger expression of expr1 and expr2. normalizedSizeWeightedNumberOfcommonSubexpressions returns the ratio of the size-weighted number of common subexpressions of expr1 and expr2 in relation to the visitation length of the larger expression of expr1 and expr2. NCSdist and SNCSdist are distance metrics derived from normalizedNumberOfCommonSubexpressions and normalizedSizeWeightedNumberOfCommonSubexpressions respectively. differingSubexpressions, and codenumberOfDifferingSubexpressions are duals of the functions described above, based on counting the number of differing subexpressions of expr1 and expr2. The possible functions "normalizedNumberOfDifferingSubexpressions" and "normalizedSizeWeightedNumberOfDifferingSubexpressions" where ommited because they are always equal to NCSdist and SNCSdist by definition. trivialMetric The "trivial" metric M(a, b) that is 0 iff a == b, 1 otherwise. normInducedTreeDistance Uses a norm on expression trees and a metric on tree node labels to induce a metric M on expression trees A and B: If both A and B are empty (represented as NULL), M(A, B) := 0. If exactly one of A or B is empty, M(A, B) := "the norm applied to the non-empty tree". If neither A or B is empty, the difference of their root node labels (as measured by labelDistance) is added to the sum of the differences of the children. The children lists are padded with empty trees to equalize their sizes. The summation operator can be changed via distanceFoldOperator. normInducedFunctionDistance Is wrapper that applies normInducedTreeDistance to the bodies of the given functions.

Usage

commonSubexpressions(expr1, expr2)
numberOfCommonSubexpressions(expr1, expr2)
normalizedNumberOfCommonSubexpressions(expr1, expr2)
NCSdist(expr1, expr2)
sizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
normalizedSizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
SNCSdist(expr1, expr2)
differingSubexpressions(expr1, expr2)
numberOfDifferingSubexpressions(expr1, expr2)
sizeWeightedNumberOfDifferingSubexpressions(expr1, expr2)
trivialMetric(a, b)
normInducedTreeDistance(norm, labelDistance = trivialMetric, distanceFoldOperator = NULL)
normInducedFunctionDistance(norm, labelDistance = trivialMetric, distanceFoldOperator = NULL)

Arguments

expr1
An R expression.
expr2
An R expression.
a
An R object.
b
An R object.
norm
A norm to derive a tree distance metric from.
labelDistance
A metric for measuring distances of tree node labels, i.e. function names or constants.
distanceFoldOperator
The operator used by normInducedTreeDistance to combine the measures subtree distances, defaults to `+`.