Learn R Programming

RStoolbox (version 0.1.1)

superClass: Supervised Classification

Description

Supervised classification both for classification and regression mode based on vector training data (points or polygons).

Usage

superClass(img, trainData, valData = NULL, responseCol = NULL,
  nSamples = 1000, areaWeightedSampling = TRUE, polygonBasedCV = FALSE,
  trainPartition = NULL, model = "rf", tuneLength = 3, kfold = 5,
  minDist = 2, mode = "classification", filename = NULL, verbose,
  overwrite = TRUE, ...)

Arguments

img
Raster* object. Typically remote sensing imagery, which is to be classified.
trainData
SpatialPolygonsDataFrame or SpatialPointsDataFrame containing the training locations.
valData
SpatialPolygonsDataFrame or SpatialPointsDataFrame containing the validation locations (optional).
responseCol
Character or integer giving the column in trainData, which contains the response variable. Can be omitted, when trainData has only one column.
nSamples
Integer. Number of samples per land cover class.
areaWeightedSampling
Logical. If TRUE scales sample size per polygon area. The bigger the polygon the more samples are taken.
polygonBasedCV
Logical. If TRUE model tuning during cross-validation is conducted on a per-polygon basis. Use this to deal with overfitting.
trainPartition
Numeric. Partition (polygon based) of trainData that goes into the training data set between zero and one. Ignored if valData is provided.
model
Character. Which model to use. See train for options. Defaults to randomForest ('rf')
tuneLength
Integer. Number of levels for each tuning paramete (see train for details).
kfold
Integer. Number of cross-validation resamples during model tuning.
minDist
Numeric. Minumum distance factor between training and validation data, e.g. minDist=1 will clip validation polygons to ensure a minimal distance of one pixel to the next training polygon. Applies onl if trainData and valData overlap.
mode
Character. Model type: 'regression' or 'classification'.
filename
Path to output file (optional). If NULL, standard raster handling will apply, i.e. storage either in memory or in the raster temp directory.
verbose
Logical. prints progress and statistics during execution
overwrite
logical. Overwrite spatial prediction raster if it already exists.
...
further arguments to be passed to train

Value

  • A list containing [[1]] the model, [[2]] the predicted raster and [[3]] the class mapping

Details

SuperClass performs the following steps:

  1. Ensure non-overlap between training and validation data. This is neccesary to avoid biased performance estimates. A minimum distance (minDist) in pixels can be provided to enforce a given distance between training and validation data.
  2. Sample training coordinates. IftrainData(andvalDataif present) are SpatialPolygonsDataFramessuperClasswill calculate the area per polygon and samplenSampleslocations per class within these polygons. The number of samples per individual polygon scales with the polygon area, i.e. the bigger the polygon, the more samples. SettingareaWeightedSampling = FALSEwill sample each polygon equally independent of its size.
  3. Split training/validation IfvalDatawas provided (reccomended) the samples from these polygons will be held-out and not used for model fitting but only for validation. IftrainPartitionis provided the trainingPolygons will be divided into training polygons and validation polygons.
  4. Extract raster data The predictor values on the sample pixels are extracted fromimg
  5. Fit the model. Using caret::train on the sampled training data themodelwill be fit, including parameter tuning (tuneLength) inkfoldcross-validation.polygonBasedCV=TRUEwill define cross-validation folds based on polygons (reccomended) otherwise it will be performed on a per-pixel basis.
  6. Predict the classes of all pixels inimgbased on the final model.
  7. Validate the model with the independent validation data.

See Also

train

Examples

Run this code
library(caret)
library(randomForest)
library(e1071)
library(raster)
input <- brick(system.file("external/rlogo.grd", package="raster"))
train <- readRDS(system.file("external/training.rds", package="RStoolbox"))

## Plot training data
olpar <- par(no.readonly = TRUE) # back-up par
par(mfrow=c(1,2))
colors <- c("yellow", "green", "deeppink")
plotRGB(input)
plot(train, add = TRUE, col =  colors[train$class], pch = 19)

## Fit classifier (splitting training into 70\\\% training data, 30\\\% validation data)
SC 	  <- superClass(input, trainData = train, responseCol = "class",
model = "rf", tuneLength = 1, trainPartition = 0.7)
SC

## Plots
plot(SC$map, col = colors, legend = FALSE, axes = FALSE, box = FALSE)
legend(1,1, legend = levels(train$class), fill = colors , title = "Classes",
horiz = TRUE,  bty = "n")
par(olpar) # reset par

Run the code above in your browser using DataLab