Learn R Programming

bnspatial (version 0.9)

dataDiscretize: Discretize data

Description

These functions discretize continuous input data into classes. Classes can be defined by the user or, if the user provides the number of expected classes, calculated from quantiles (default option) or by equal intervals. dataDiscretize processes a single variable at a time, provided as vector. bulkDiscretize discretizes multiple input rasters, by using parallel processing.

Usage

dataDiscretize(data, classBoundaries = NULL, classStates = NULL, method = "quantile")
bulkDiscretize(formattedLst, xy, inparallel = FALSE)

Arguments

data
numeric vector. The continuous data to be discretized.
classBoundaries
numeric vector or single integer. Interval boundaries to be used for data discretization. Outer values (minimum and maximum) required. -Inf or Inf are allowed, in which case data minimum and maximum will be used to evaluate the mid values of outer classes. Alternatively, a single integer to indicate the number of classes, to split by quantiles (default) or equal intervals.
classStates
vector. The state labels to be assigned to the discretized data.
method
character. What splitting method should be used? This argument is ignored if a vector of values is passed to classBoundaries.
  • quantile splits data into quantiles (default).
  • equal splits data into equally sized intervals based on data minimum and maximum.
formattedLst
A formatted list as returned by linkNode and linkMultiple
xy
matrix. A matrix of spatial coordinates; first column is x (longitude), second column is y (latitude) of locations (in rows).
inparallel
logical or integer. Should the function use parallel processing facilities? Default is FALSE: a single process will be launched. If TRUE, all cores/processors but one will be used. Alternatively, an integer can be provided to dictate the number of cores/processors to be used.

Value

dataDiscretize returns a named list of 4 vectors:
  • $discreteDatathe discretized data, labels are applied accordingly if classStates argument is provided
  • $classBoundariesthe class boundaries, i.e. values splitting the classes
  • $midValuesthe mid point for each class (the mean of its lower and upper boundaries)
  • $classStatesthe labels assigne to each class
bulkDataDiscretize returns a matrix: in columns each node associated to input spatial data, in rows their discretized values at coordinates specified by argument xy.

Details

dataDiscretize

Examples

Run this code
s <- runif(30)

# Split by user defined values. Values out of boundaries are set to NA:
dataDiscretize(s, classBoundaries = c(0.2, 0.5, 0.8)) 

# Split by quantiles (default):
dataDiscretize(s, classStates = c('a', 'b', 'c'))

# Split by equal intervals:
dataDiscretize(s, classStates = c('a', 'b', 'c'), method = "equal")

# When -Inf and Inf are provided as external boundaries, $midValues of outer classes
# are calculated on the minimum and maximum values:
dataDiscretize(s, classBoundaries=c(0, 0.5, 1), classStates=c("first", "second"))[c(2,3)]
dataDiscretize(s, classBoundaries=c(-Inf, 0.5, Inf), classStates=c("first", "second"))[c(2,3)]

## Discretize multiple spatial data by location
data(ConwyData)
network <- LandUseChange
spatialData <- c(currentLU, slope, status)

# Link multiple spatial data to the network nodes and discretize
spDataLst <- linkMultiple(spatialData, network, LUclasses, verbose = FALSE)
coord <- aoi(currentLU, xy=TRUE)
head( bulkDiscretize(spDataLst, coord) )

Run the code above in your browser using DataLab