Learn R Programming

forestFloor (version 1.5)

convolute_ff: Convolution SPECIFIC set of featureContributions by corresponding features with kknn-package

Description

Can do any order of convolution, whereas convolute_ff rather do batches of first order convolution. With kNN- gaussian kernel and LOO cross-validation.

Usage

convolute_ff(ff,
             these.vars=NULL,
             k.fun=function() round(sqrt(n.obs)/2),
             userArgs.kknn = alist(kernel="gaussian"))

Arguments

ff
forestFloor object(class="forestFloor") concisting of at least ff$X and ff$FCmatrix with two matrices of equal size
these.vars
vector of col.indices to ff$X. Convolution can be limited to these.vars
k.fun
function to define k-neighbors to concider. n.obs is a constant as number of observations in ff$X. Hereby k neighbors is defined as a function k.fun of n.obs. To set k to a constant use e.g. k.fun = function() 10. k can also be overridden with userArgs.
userArgs.kknn
argument list to pass to train.kknn function for each convolution. See (link) kknn.args. Conflicting arguments to this list will be overridden e.g. k.fun.

Value

  • ff$FCfit, a matrix of convoluted featureContributions

Details

convolute_ff uses train.kknn from kknn package to convolute featureContributions by their corresponding varialbles. The output inside a ff$FCfit will resemble ff$Fmatrix for any coloumn/variable which is well explained by its main effect. Guassian weighting of nearest neighbors lowers bias of the fit.

Examples

Run this code
#simulate data
obs=2500
vars = 6 
X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + 2*sin(X2*pi) + 8 * X3 * X4)
Yerror = 5 * rnorm(obs)
cor(Y,Y+Yerror)^2
Y= Y+Yerror

#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag=TRUE,ntree=1000,sampsize=800)

ff = forestFloor(rfo,X)

ff = convolute_ff(ff)

#the convolutions correlation to the feature contribution
for(i in 1:6) print(cor(ff$FCmatrix[,i],ff$FCfit[,i])^2)

#plotting the feature contributions 
pars=par(no.readonly=TRUE) #save graphicals
par(mfrow=c(3,2),mar=c(2,2,2,2))
for(i in 1:6) {
  plot(ff$X[,i],ff$FCmatrix[,i],col="#00000030",ylim=range(ff$FCmatrix))
  points(ff$X[,i],ff$FCfit[,i],col="red",cex=0.2)

}
par(pars) #restore graphicals

Run the code above in your browser using DataLab