Learn R Programming

Semblance (version 1.1.0)

A Data-Driven Similarity Kernel on Probability Spaces

Description

We present a rank-based Mercer kernel to compute a pair-wise similarity metric corresponding to informative representation of data. We tailor the development of a kernel to encode our prior knowledge about the data distribution over a probability space. The philosophical concept behind our construction is that objects whose feature values fall on the extreme of that feature<80><99>s probability mass distribution are more similar to each other, than objects whose feature values lie closer to the mean. Semblance emphasizes features whose values lie far away from the mean of their probability distribution. The kernel relies on properties empirically determined from the data and does not assume an underlying distribution. The use of feature ranks on a probability space ensures that Semblance is computational efficacious, robust to outliers, and statistically stable, thus making it widely applicable algorithm for pattern analysis. The output from the kernel is a square, symmetric matrix that gives proximity values between pairs of observations.

Copy Link

Version

Install

install.packages('Semblance')

Monthly Downloads

152

Version

1.1.0

License

GPL-2

Maintainer

Divyansh Agarwal

Last Published

January 25th, 2019

Functions in Semblance (1.1.0)

repRow

Make a matrix by repeating vector v into n rows
repCol

Make a matrix by repeating vector v into n columns
makeUpperLower

Make the upper triangular part the same as the lower triangular part.
ranksem

Compute Semblance For a Given Input Matrix or Data Frame
ranksem_Gini

Compute Gini-weighted Semblance
computeSemblanceOneFeature

Compute semblance when there is only one feature, given as a vector x.
computeSemblanceOneFeature_Gini

Compute semblance when there is only one feature, given as a vector x, but weight the feature by its Gini coefficient. Use for data with strictly positive values.