HorseRuleFit: Horseshoe RuleFit

Description

Fits the Horseshoe Rulefit model described in cite1horserule (https://arxiv.org/abs/1702.05008)

Usage

HorseRuleFit(X = NULL, y = NULL, Xtest = NULL, ytest = NULL,
  niter = 1000, burnin = 100, thin = 1, restricted = 0.001,
  shrink.prior = "HS", beta = 2, alpha = 1, linp = 1, ensemble = "RF",
  L = 4, S = 6, ntree = 250, minsup = 0.025, mix = 0.5,
  linterms = NULL, intercept = F, ytransform = "linear")

Arguments

A matrix or dataframe containing the predictor variables to be used.

A vector containing the response variables. If numeric regression is performed and classification otherwise.

Xtest

optional matrix or dataframe containing predictor variables of test set.

ytest

optional vector containing the response values of the test set.

niter

number of iterations for the horseshoe sampling.

burnin

number of initial samples to be disregarded as burnin.

thin

thinning parameter.

restricted

Threshold for restricted Gibbs sampling. In each iteration only coefficients with scale > restricted are updated. Set restricted = 0 for unrestricted Gibbs sampling.

shrink.prior

Specifies the shrinkage prior to be used for regularization. Currently the options "HS" and "HS+" for the Horseshoe+ are supported.

beta

Hyperparameter to control the extra shrinkage on the rule complexity meassured as the rule length.

alpha

Hyperparameter to control the extra shrinkage on the rules that cover only few observations. Set alpha = beta = 0 for the standard horseshoe without rule structure prior.

linp

Hyperparameter to control prior shrinkage of linear terms. Set linp > 1 if strong linear effects are assumed.

ensemble

Which ensemble method should be used to generate the rules? Options are "RF","GBM" or "both".

Parameter controling the complexity of the generated rules. Higher values lead to more complex rules.

Parameter controlling the minimum number of observations in the tree growing process.

ntree

Number of trees in the ensemble step from which the rules are extracted.

minsup

Rules with support < minsup are removed. Can be used to prevent overfitting.

mix

If ensemble = "both" mix*ntree are generated via random forest and (1-mix)*ntree trees via gradient boosting.

linterms

specifies the columns in X which should be included as linear terms in the hs rulefit model. Specified columns need to be numeric. Categorical variables have to be transformed (e.g. to dummies) before included as linear effects.

intercept

If TRUE an intercept is included. Note that the y by default is shifted to have 0 mean therefor not necessary for regression. For classification highly recommended.

ytransform

Choose "log" for logarithmic transform of y.

Value

An object of class HorseRuleFit, which is a list of the following components:

bhat

Posterior mean of the regression coefficients.

postdraws

List contraining the Posterior samples of the regression coefficients, error variance sigma and shrinkage tau.

rules

Vector containing the decision rules.

Matrix containing original training data and the decision rule covariates (normalized).

Response in train data.

prior

Vector rule structure prior for the individual rules.

modelstuff

List contraining the parameters used and values used for the normalization (means and sds).

pred

If Test data was supplied, gives back the predicted values.

err

If y-test was also supplies additionally gives back a test error score (RMSE for regression, Missclassificationrate for Classficitaion).

Examples

Run this code

# NOT RUN {
library(MASS)
library(horserule)
data(Boston)
# Split in train and test data
N = nrow(Boston)
train = sample(1:N, 400)
Xtrain = Boston[train,-14]
ytrain = Boston[train, 14]
Xtest = Boston[-train, -14]
ytest = Boston[-train, 14]

# Run the HorseRuleFit with 100 trees
# Increase Number of trees and the number of posterior samples for better modelfit
hrres = HorseRuleFit(X = Xtrain, y=ytrain,
                    thin=1, niter=100, burnin=10,
                    L=5, S=6, ensemble = "both", mix=0.3, ntree=100,
                    intercept=FALSE, linterms=1:13, ytransform = "log",
                    alpha=1, beta=2, linp = 1, restricted = 0)

# Calculate the error
pred = predict(hrres, Xtest, burnin=100, postmean=TRUE)
sqrt(mean((pred-ytest)^2))

# Look at the most important rules/linear effects.
importance_hs(hrres)

# Look at the input variable importance.
Variable_importance(hrres, var_names=colnames(Xtrain))
# }

Run the code above in your browser using DataLab