performance: Performance

Description

Applies one or more metrics to a data.frame containing columns with actual and predicted values as well as an optional column with case weights. The results are returned as a data.frame and can be used in a dplyr chain.

Usage

performance(
  data,
  actual,
  predicted,
  w = NULL,
  metrics = rmse,
  key = "metric",
  value = "value",
  ...
)

Arguments

data

A data.frame containing actual, predicted and possibly w.

actual

The column name in data referring to actual values.

predicted

The column name in data referring to predicted values.

The optional column name in data referring to case weights.

metrics

Either a function or a named list of functions. Each function represents a metric and has four arguments: observed, predicted, case weights and .... If not a named list but a single function, the name of the function is guessed by deparse(substitute(...)), which would not provide the actual name of the function if called within lapply etc. In such cases, you can pass a named list with one element, e.g. list(rmse = rmse).

key

Name of the resulting column containing the name of the metric. Defaults to "metric".

value

Name of the resulting column with the value of the metric. Defaults to "value".

...

Further arguments passed to the metric functions, e.g. if the metric is "r_squared", you could pass the relevant deviance function as additional argument (see examples).

Value

Data frame with one row per metric and two columns: key and value.

Examples

Run this code

# NOT RUN {
ir <- iris
fit_num <- lm(Sepal.Length ~ ., data = ir)
ir$fitted <- fit_num$fitted
performance(ir, "Sepal.Length", "fitted")
performance(ir, "Sepal.Length", "fitted", metrics = r_squared)
performance(ir, "Sepal.Length", "fitted", metrics = c(`R-squared` = r_squared, rmse = rmse))
performance(ir, "Sepal.Length", "fitted", metrics = r_squared,
            deviance_function = deviance_gamma)
performance(ir, "Sepal.Length", "fitted", metrics = r_squared,
            deviance_function = deviance_tweedie)
performance(ir, "Sepal.Length", "fitted", metrics = r_squared,
            deviance_function = deviance_tweedie, tweedie_p = 2)
performance(ir, "Sepal.Length", "fitted", metrics = r_squared,
            deviance_function = deviance_tweedie, tweedie_p = 0)
# }
# NOT RUN {
library(dplyr)

iris <!-- %>% -->
  mutate(pred = predict(fit_num, data = .)) <!-- %>% -->
  performance("Sepal.Length", "pred")

# Same
iris <!-- %>% -->
  mutate(pred = predict(fit_num, data = .)) <!-- %>% -->
  performance("Sepal.Length", "pred", metrics = rmse)

# Grouped by Species
iris <!-- %>% -->
  mutate(pred = predict(fit_num, data = .)) <!-- %>% -->
  group_by(Species) <!-- %>% -->
  do(performance(., "Sepal.Length", "pred"))

# Multiple measures
iris <!-- %>% -->
 mutate(pred = predict(fit_num, data = .)) <!-- %>% -->
 performance("Sepal.Length", "pred",
             metrics = list(rmse = rmse, mae = mae, `R-squared` = r_squared))

# Grouped by Species
iris <!-- %>% -->
 mutate(pred = predict(fit_num, data = .)) <!-- %>% -->
 group_by(Species) <!-- %>% -->
 do(performance(., "Sepal.Length", "pred",
                metrics = list(rmse = rmse, mae = mae, `R-squared` = r_squared)))
# }

Run the code above in your browser using DataLab