Learn R Programming

rules (version 0.1.1)

tidy.cubist: Turn regression rule models into tidy tibbles

Description

Turn regression rule models into tidy tibbles

Usage

# S3 method for cubist
tidy(x, ...)

# S3 method for xrf tidy(x, penalty = NULL, unit = c("rules", "columns"), ...)

Arguments

x

A Cubist or xrf object.

...

Not currently used.

penalty

A single numeric value for the lambda penalty value.

unit

What data should be returned? For unit = 'rules', each row corresponds to a rule. For unit = 'columns', each row is a predictor column. The latter can be helpful when determining variable importance.

Value

The Cubist method has columns committee, rule_num, rule, estimate, and statistics. The latter two are nested tibbles. estimate contains the parameter estimates for each term in the regression model and statistics has statistics about the data selected by the rules and the model fit.

The xrf results has columns rule_id, rule, and estimate. The rule_id column has the rule identifier (e.g., "r0_21") or the feature column name when the column is added directly into the model. For multiclass models, a class column is included.

In each case, the rule column has a character string with the rule conditions. These can be converted to an R expression using rlang::parse_expr().

Examples

Run this code
# NOT RUN {
library(dplyr)

data(ames, package = "modeldata")

ames <-
  ames %>%
  mutate(Sale_Price = log10(ames$Sale_Price),
         Gr_Liv_Area = log10(ames$Gr_Liv_Area))

# ------------------------------------------------------------------------------

# }
# NOT RUN {
cb_fit <-
  cubist_rules(committees = 10) %>%
  set_engine("Cubist") %>%
  fit(Sale_Price ~ Neighborhood + Longitude + Latitude + Gr_Liv_Area + Central_Air,
      data = ames)

cb_res <- tidy(cb_fit)
cb_res

cb_res$estimate[[1]]
cb_res$statistic[[1]]
# }
# NOT RUN {
# ------------------------------------------------------------------------------

# }
# NOT RUN {
library(recipes)

xrf_reg_mod <-
  rule_fit(trees = 10, penalty = .001) %>%
  set_engine("xrf") %>%
  set_mode("regression")

# Make dummy variables since xgboost will not
ames_rec <-
  recipe(Sale_Price ~ Neighborhood + Longitude + Latitude +
         Gr_Liv_Area + Central_Air,
         data = ames) %>%
  step_dummy(Neighborhood, Central_Air) %>%
  step_zv(all_predictors())

ames_processed <- prep(ames_rec) %>% bake(new_data = NULL)

set.seed(1)
xrf_reg_fit <-
  xrf_reg_mod %>%
  fit(Sale_Price ~ ., data = ames_processed)

xrf_rule_res <- tidy(xrf_reg_fit)
xrf_rule_res$rule[nrow(xrf_rule_res)] %>% rlang::parse_expr()

xrf_col_res <- tidy(xrf_reg_fit, unit = "columns")
xrf_col_res
# }

Run the code above in your browser using DataLab