
Last chance! 50% off unlimited learning
Sale ends in
Creates an efficient kernel estimator for functional data
classification. Currently
supported distance measures are all metrics
implemented in dist
and all semimetrics suggested in Fuchs et al. (2015).
Additionally, all (semi-)metrics can be used on a derivative of arbitrary
order of the functional observations.
For kernel functions all kernels implemented in fda.usc
are available as well as custom kernel functions.
classiKernel(classes, fdata, grid = 1:ncol(fdata), h = 1, metric = "L2",
ker = "Ker.norm", nderiv = 0L, derived = FALSE,
deriv.method = "base.diff", custom.metric = function(x, y, ...) {
return(sqrt(sum((x - y)^2))) }, custom.ker = function(u) {
return(dnorm(u)) }, ...)
[factor(nrow(fdata))
]
factor of length nrow(fdata)
containing the classes of the observations.
[matrix
]
matrix containing the functional observations as rows.
[numeric(ncol(fdata))
]
numeric vector of length ncol(fdata)
containing the grid on which the functional observations were
evaluated.
[numeric(1)]
controls the bandwidth of the kernel function. All kernel functions ker
should be
implemented to have bandwidth = 1. The bandwidth is controlled via h
by using K(x) = ker(x/h)
as the kernel function.
[character(1)
]
character string specifying the (semi-)metric to be used.
For a an overview of what is available see the
method
argument in computeDistMat
. For a full list
execute metricChoices()
.
[numeric(1)]
character string describing the kernel function to use. Available are
amongst others all kernel functions from Kernel
.
For the full list execute kerChoices()
.
The usage of customized kernel function is symbolized by
ker = "custom.ker"
. The customized function can be specified in
custom.ker
[integer(1)
]
order of derivation on which the metric shall be computed.
The default is 0L.
[character(1)
]
character indicate which method should be used for derivation. Currently
implemented are "base.diff"
, the default, and "fda.deriv.fd"
.
"base.diff"
uses the method base::diff
for equidistant measures
without missing values, which is faster than transforming the data into the
class fd
and deriving this using fda::deriv.fd
.
The second variant implies smoothing, which can be preferable for calculating
high order derivatives.
[function(x, y, ...)
]
only used if deriv.method = "custom.method"
.
A function of functional observations
x
and y
returning their distance.
The default is the L2 distance.
See how to implement your distance function in dist
.
[function(u)]
customized kernel function. This has to be a function with exactly one parameter
u
, returning the numeric value of the kernel function
ker(u)
. This function is only used if ker == "custom.ker"
.
The bandwidth should be constantly equal to 1 and is controlled via h
.
further arguments to and from other methods. Hand over additional arguments to
computeDistMat
, usually additional arguments for the specified
(semi-)metric. Also, if deriv.method == "fda.deriv.fd"
or
fdata
is not observed on a regular grid, additional arguments to
fdataTransform
can be specified which will be passed on to
Data2fd
.
classiKernel
returns an object of class 'classiKernel'
.
This is a list containing at least the
following components:
classes
a factor of length nrow(fdata) coding the response of the training data set.
fdata
the raw functional data as a matrix with the individual observations as rows.
proc.fdata
the preprocessed data (missing values interpolated,
derived and evenly spaced). This data is this.fdataTransform(fdata)
.
See this.fdataTransform
for more details.
grid
numeric vector containing the grid on which fdata
is observed)
h
numeric value giving the bandwidth to be used in the kernel function.
ker
character encoding the kernel function to use.
metric
character string coding the distance metric to be used
in computeDistMat
.
nderiv
integer giving the order of derivation that is applied to fdata before computing the distances between the observations.
this.fdataTransform
preprocessing function taking new data as
a matrix. It is used to transform fdata
into proc.fdata
and
is required to preprocess new data in order to predict it. This function
ensures, that preprocessing (derivation, respacing and interpolation of
missing values) is done in the exact same way for the original
training data set and future (test) data sets.
call
the original function call.
Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.
predict.classiKernel
# NOT RUN {
# How to implement your own kernel function
data("ArrowHead")
classes = ArrowHead[,"target"]
set.seed(123)
train_inds = sample(1:nrow(ArrowHead), size = 0.8 * nrow(ArrowHead), replace = FALSE)
test_inds = (1:nrow(ArrowHead))[!(1:nrow(ArrowHead)) %in% train_inds]
ArrowHead = ArrowHead[,!colnames(ArrowHead) == "target"]
# custom kernel
myTriangularKernel = function(u) {
return((1 - abs(u)) * (abs(u) < 1))
}
# create the model
mod1 = classiKernel(classes = classes[train_inds], fdata = ArrowHead[train_inds,],
ker = "custom.ker", h = 2, custom.ker = myTriangularKernel)
# calculate the model predictions
pred1 = predict(mod1, newdata = ArrowHead[test_inds,], predict.type = "response")
# prediction accuracy
mean(pred1 == classes[test_inds])
# create another model using an existing kernel function
mod2 = classiKernel(classes = classes[train_inds], fdata = ArrowHead[train_inds,],
ker = "Ker.tri", h = 2)
# calculate the model predictions
pred2 = predict(mod1, newdata = ArrowHead[test_inds,], predict.type = "response")
# prediction accuracy
mean(pred2 == classes[test_inds])
# }
# NOT RUN {
# Parallelize across 2 CPU's
library(parallelMap)
parallelStartSocket(2L) # parallelStartMulticore for Linux
predict(mod1, newdata = fdata[test_inds,], predict.type = "prob", parallel = TRUE, batches = 2L)
parallelStop()
# }
Run the code above in your browser using DataLab