Learn R Programming

htree (version 1.0.0)

varimp_hrf: Variable importance

Description

Z-score variable importance for hrf and htb

Usage

varimp_hrf(object,nperm=20)
varimp_htb(object,nperm=20,ntrees)

Arguments

object

Return list from hrf or htb

nperm

Number of permutations.

ntrees

Number of trees. Only for varimp_htb.

Value

A data.frame with Relative change: relative change in OOB error due to variable permutation; Mean change: mean change in OOB error due to variable permutation; SE: standard error of Mean change; Z-value: Mean change/SE;P-value: Approximate p-value of Z-value.

Details

To measure the importance of a predictor, varimp_hrf and varimp_htb compare the prediction errors of the estimated model with the prediction errors obtained after integrating the predictor out of the model. If \(F\) denotes the estimated model, the model obtained by integrating out predictor k is \(F_k(x)=\int F(x) dP(x_k)\). Here \(P(x_k)\) is the marginal distribution of \(x_k\). In practice, the integration is done by averaging over multiple predictions from \(F\), where each has been obtained using a random permutation of the observed values of \(x_k\). The number of permutations is determined by nperm. Letting \(L(y,y_{hat})\)) be the loss of predicting \(y\) with \(y_{hat}\), one obtains the vector \(w_i=L(y_i,F_k(x_i))-L(y_i,F(x_i))\) for \(i=1,..,n\). The corresponding z-score is \(z=mean(w_i)/se(w_i)\), which is an approximate paired test for the equality of the prediction errors. Larger z-score values indicate that the prediction error increases if \(x_k\) is marginalized out, and thus that \(x_k\) is useful. On the other hand, large negative values of the z-score indicate that the integrated model is more accurate. For longitudinal data the w_i are computed by averaging across all observations from the i-th subject. For htb the prediction error is calculated based on the cross-validation model estimates, for hrf out-of-bag predictions are used.

References

L. Breiman (2001). “Random Forests,” Machine Learning 45(1):5-32.

See Also

hrf, htb

Examples

Run this code
# NOT RUN {

# --------------------------------------------------------------------------------------------- ##
# Boston Housing data 
#	Comparison of Z-score variable importance with coefficient Z-scores from linear model
# --------------------------------------------------------------------------------------------- ##

# Boston Housing data 
library(mlbench)
data(BostonHousing)
dat=as.data.frame(na.omit(BostonHousing))
dat$chas=as.numeric(dat$chas)

# -- random forest 
h=hrf(x=dat,yindx="medv",ntrees=500)


# -- tree boosting
hb=htb(x=dat,yindx="medv",ntrees=500,cv.fold=10)


# -- Comparison of variable importance Z-scores and Z-scores from linear model 
vi=varimp_hrf(h)
vb=varimp_htb(hb)
dvi=data.frame(var=rownames(vi),Z_hrf=vi$Z)
dvb=data.frame(var=rownames(vb),Z_htb=vb$Z)

dlm=summary(lm(medv~.,dat))$coeffi
dlm=data.frame(var=rownames(dlm),Z_lm=round(abs(dlm[,3]),3))
dlm=merge(dlm[-1,],dvi,by="var",all.x=TRUE)

# -- Z-scores of hrf and lm for predictor variables 
merge(dlm,dvb,by="var",all.x=TRUE)



# }
# NOT RUN {
# }

Run the code above in your browser using DataLab