Learn R Programming

htree (version 0.1.1)

varimp_hrf: Variable importance

Description

Z-score variable importance for hrf and htb

Usage

varimp_hrf(object,nperm=10)
varimp_htb(object,nperm=10,ntrees)

Arguments

object

Return list from hrf or htb

nperm

Vector of observation times

ntrees

Number of trees. Only for varimp_htb.

Value

z-scores for the predictors.

Details

To measure the importance of a predictor, varimp_hrf and varimp_htb compare the prediction errors of the estimated model with the prediction errors obtained after integrating the predictor out of the model. If \(F\) denotes the estimated model, the model obtained by integrating out predictor k is \(F_k(x)=\int F(x) dP(x_k)\). Here \(P(x_k)\) is the marginal distribution of \(x_k\). In practice, the integration is done by averaging over multiple predictions from \(F\), where each has been obtained using a random permutation of the observed values of \(x_k\). The number of permutations is determined by nperm. Letting \(L(y,y_{hat})\)) be the loss of predicting \(y\) with \(y_{hat}\), one obtains the vector \(w_i=L(y_i,F_k(x_i))-L(y_i,F(x_i))\) for \(i=1,..,n\). The corresponding z-score is \(z=mean(w_i)/se(w_i)\), which is an approximate paired test for the equality of the prediction errors. Larger z-score values indicate that the prediction error increases if \(x_k\) is marginalized out, and thus that \(x_k\) is useful. On the other hand, large negative values of the z-score indicate that the integrated model is more accurate. For longitudinal data the w_i are computed by averaging across all observations from the i-th subject. For htb the prediction error is calculated based on the cross-validation model estimates, for hrf out-of-bag predictions are used.

References

L. Breiman (2001). “Random Forests,” Machine Learning 45(1):5-32.

See Also

hrf, htb

Examples

Run this code
# NOT RUN {
data(mscm) 
mscm=na.omit(mscm)

# -- random forest model (predicting illness, with stress and illness as historical predictors)
ff=hrf(x=as.matrix(mscm),id=mscm$id,time=mscm$day,yindx=4,vh=c(3,4),vc=c(1,2,5:14))
vi=varimp_hrf(ff,nperm=20)
vi=vi[vi>0]
vi=vi[order(vi,decreasing=TRUE)]
barplot(vi,main="Importance z-scores") 

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab