neighborhood: Given Bayesian features, returns those samples from a dataset that exhibit a similarity (i.e., the neighborhood).

Description

The neighborhood \(N_i\) is defined as the set of samples that have a similarity greater than zero to the given sample \(s_i\). Segmentation is done using equality (==) for discrete features and less than or equal (<=) for continuous features. Note that feature values NA and NaN are also supported using is.na() and is.nan().

Usage

neighborhood(df, features, selectedFeatureNames = c(), retainMinValues = 0)

Arguments

data.frame to select the neighborhood from

features

data.frame of Bayes-features, used to segment/select the rows that should make up the neighborhood.

selectedFeatureNames

vector of names of features to use to demarcate the neighborhood. If empty, uses all features' names.

retainMinValues

DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.

Value

data.frame with rows that were selected as neighborhood. It is guaranteed that the rownames are maintained.

Examples

Run this code

# NOT RUN {
nbh <- mmb::neighborhood(df = iris, features = mmb::createFeatureForBayes(
  name = "Sepal.Width", value = mean(iris$Sepal.Width)))

print(nrow(nbh))
# }

Run the code above in your browser using DataLab