ffs_train: Forward feature selection based on rf model

Description

ffs_train is a wrapper function for a simple use of the forward feature selection approach of training random forest classification models. This validation is particulary suitable for leave-location-out cross validations where variable selection MUST be based on the performance of the model on the hold out station. See Meyer et al. (2018) for further details. This is in fact the case while using time space variable vegetation patterns for classification purposes. For the UAV based RGB/NIR imagery, it provides an optimized preconfiguration for the classification goals.

Usage

ffs_train(trainingDF = NULL, predictors = c("R", "G", "B"),
  response = "ID", spaceVar = "FN", names = c("ID", "R", "G", "B",
  "A", "FN"), noLoc = NULL, sumFunction = "twoClassSummary",
  pVal = 0.5, prefin = "final_", preffs = "ffs_",
  modelSaveName = "model.RData", runtest = FALSE, seed = 100,
  withinSE = TRUE, mtry = 2, noClu = 1)

Arguments

trainingDF

dataframe. containing training data

predictors

character. vector of predictor names as given by the header of the training data table

response

character. name of response variable as given by the header of the training data table

spaceVar

character. name of the spacetime splitting vatiable as given by the header of the training data table

names

character. all names of the dataframe header

noLoc

numeric. number of locations to leave out usually number of discrete trainings locations/images

sumFunction

character. function to summarize default is "twoClassSummary"

pVal

numeric. used part of the training data default is 0.5

prefin

character. name pattern used for model default is "final_"

preffs

character. name pattern used for ffs default is "ffs_"

modelSaveName

character. name pattern used for saving the model default is "model.RData"

runtest

logical. default is false, if set a external validation will be performed

seed

numeric. number for seeding

withinSE

locical. compares the performance to models that use less variables (e.g. if a model using 5 variables is better than a model using 4 variables but still in the standard error of the 4-variable model, then the 4-variable model is rated as the better model).

mtry

numerical. Number of variable is randomly collected to be sampled at each split time

noClu

numeric. number of cluster to be used

Examples

Run this code

# NOT RUN {
require(uavRst)

##- project folder
projRootDir<-tempdir()

# create subfolders please mind that the pathes are exported as global variables
paths<-link2GI::initProj(projRootDir = projRootDir,
                         projFolders = c("data/","data/ref/","output/","run/","las/"),
                         global = TRUE,
                         path_prefix = "path_")
setwd(path_run)
unlink(paste0(path_run,"*"), force = TRUE)

##- get the rgb image, chm and training data
utils::download.file("https://github.com/gisma/gismaData/raw/master/uavRst/data/ffs.zip",
                      paste0(path_run,"ffs.zip"))
unzip(zipfile = paste0(path_run,"ffs.zip"),exdir = ".")

##- get geometrical training data assuming that you have used before the calc_ext function
trainDF<-readRDS(paste0(path_run,"tutorial_trainDF.rds"))
load(paste0(path_run,"tutorialbandNames.RData"))

##- define the classes
 idNumber=c(1,2,3)
 idNames= c("green tree","yellow tree","no tree")
##- add classes names
 for (i in 1:length(idNumber)){
   trainDF$ID[trainDF$ID==i]<-idNames[i]
 }
##- convert to factor (required by rf)
 trainDF$ID <- as.factor(trainDF$ID)
##- now prepare the predictor and response variable names
##- get actual name list from the DF
 name<-names(trainDF)
##- cut leading and tailing ID/FN
 predictNames<-name[3:length(name)-1]

##- call Training
 model <-  ffs_train(trainingDF= trainDF,
                     predictors= predictNames,
                     response  = "ID",
                     spaceVar  = "FN",
                     names     = name,
                     pVal      = 0.1,
                     noClu     = 1)

##- for classification/prediction go ahead with the predict_RGB function
##+
# }

Run the code above in your browser using DataLab