similarityweight: Calculate the similarity weight for a set of observations

Description

Calculate the similarity weight for a set of observations, based on their distance from some arbitary points in data space. Observations which are very similar to the point under consideration are given weight 1, while observations which are dissimilar to the point are given weight zero.

Usage

similarityweight(x, data, threshold = NULL, distance = NULL,
  lambda = NULL)

Arguments

A dataframe describing arbitrary points in the space of the data (i.e., with same colnames as data).

data

A dataframe representing observed data.

threshold

Threshold distance outside which observations will be assigned similarity weight zero. This is numeric and should be > 0. Defaults to 1.

distance

The type of distance measure to be used, currently just two types of Minkowski distance: "euclidean" (default), and "maxnorm".

lambda

A constant to multiply by the number of categorical mismatches, before adding to the Minkowski distance, to give a general dissimilarity measure. If left NULL, behaves as though lambda is set larger than threshold, meaning that one factor mismatch guarantees zero weight.

Value

A numeric vector or matrix, with values from 0 to 1. The similarity weights for the observations in data arranged in rows for each row in x.

Details

Similarity weight is assigned to observations based on their distance from a given point. The distance is calculated as Minkowski distance between the numeric elements for the observations whose categorical elements match, with the option to use a more general dissimilarity measure comprising Minkowski distance and a mismatch count.

References

O'Connell M, Hurley CB and Domijan K (2017). ``Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.''Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.

Examples

Run this code

# NOT RUN {
## Say we want to find observations similar to the first observation.
## The first observation is identical to itself, so it gets weight 1. The
## second observation is similar, so it gets some weight. The rest are more
## different, and so get zero weight.

data(mtcars)
similarityweight(x = mtcars[1, ], data = mtcars)

## By increasing the threshold, we can find observations which are more
## approximately similar to the first row. Note that the second observation
## now has weight 1, so we lose some ability to discern how similar
## observations are by increasing the threshold.

similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5)

## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag
## is more similar to the Merc 280 than the Mazda RX4 is.

similarityweight(mtcars[1:2, ], mtcars, threshold = 3)

# }

Run the code above in your browser using DataLab