ingredients (version 0.3.1)

select_neighbours: Select Subset of Rows Closest to a Specified Observation

Description

This function selects subset of rows from data set. This is useful if data is large and we need just a sample to calculate profiles.

Usage

select_neighbours(data, observation, variables = NULL,
  distance = gower::gower_dist, n = 20, frac = NULL)

Arguments

data

set of observations

observation

single observation

variables

names of variables that shall be used for calculation of distance. By default these are all variables present in `data` and `observation`

distance

the distance function, by default the `gower_dist` function.

n

number of neighbours to select

frac

if `n` is not specified (NULL), then will be calculated as `frac` * number of rows in `data`. Either `n` or `frac` need to be specified.

Value

a data frame with selected rows

Details

Note that select_neighbours function is S3 generic. If you want to work on non standard data sources (like H2O ddf, external databases) you should overload it.

Examples

Run this code
# NOT RUN {
library("DALEX")

new_apartment <- apartments[1, 2:6]
small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10)
new_apartment
small_apartments
# }

Run the code above in your browser using DataCamp Workspace