Learn R Programming

ceterisParibus (version 0.4.2)

select_neighbours: Select Subset of Rows Closest to a Specified Observation

Description

This function selects subset of rows from data set. This is usefull if data is large and we need just a sample to calculate profiles.

Usage

select_neighbours(
  data,
  observation,
  variables = NULL,
  distance = gower::gower_dist,
  n = 20,
  frac = NULL
)

Value

a data frame with selected rows

Arguments

data

set of observations

observation

single observation

variables

variables that shall be used for calculation of distance. By default these are all variables present in `data` and `observation`

distance

distance function, by default the `gower_dist` function.

n

number of neighbours to select

frac

if `n` is not specified (NULL), then will be calculated as `frac` * number of rows in `data`. Either `n` or `frac` need to be specified.

Details

Note that select_neighbours function is S3 generic. If you want to work on non standard data sources (like H2O ddf, external databases) you should overload it.

Examples

Run this code
library("DALEX")

new_apartment <- apartments[1, 2:6]
small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10)
new_apartment
small_apartments

Run the code above in your browser using DataLab