funcit: Functional Cluster Analysis

Description

Main function for clustering functional data according to one or several of seven algorithms.

Usage

funcit(data, k, methods=c("fitfclust","distclust", "iterSubspace",
       "funclust", "funHDDC", "fscm", "waveclust"), seed=NULL, regTime=NULL,
       clusters=NULL, funcyCtrl=NULL, fpcCtrl=NULL, parallel=FALSE,
       save.data=TRUE, ...)

Arguments

data

Data in format "Format1" or format "Format2" (see formatFuncy).

Number of clusters.

methods

"fitfclust":: Model based cluster algorithm - based on a functional mixed mixture model. Allows irregular measurements, eigenbasis possible.
"distclust":: Cluster algorithm - based on a distance measure. Allows irregular measurements, eigenbasis possible.
"iterSubspace":: Model based cluster algorithm - based on a subspace projection. Allows irregular measurements, eigenbasis possible, dimension between clusters can vary.
"funclust":: Model based cluster algorithm - based on a functional mixed mixture model.
"funHDDC":: Model based cluster algorithm - based on a functional mixed mixture model. Dimension between clusters can vary.
"fscm":: Model based cluster algorithm - based on a functional mixed mixture model. Curves can dependent on location. A matrix location is then an optional input parameter (see Details).
"waveclust":: Model based cluster algorithm - based on a functional mixed mixture model. Wavelet basis is the only possible.

For a detailed description of the methods please see the references.

seed

Seed for initial clustering. See funcyCtrl.

regTime

If data is in "Format2", optional vector representing the time points (see formatFuncy). If regTime=NULL and format="Format2", equidistant time points from 1 to number of curves are used.

clusters

Optional vector of true cluster labels.

funcyCtrl

A control object of class funcyCtrl. If a model based clustering algorithm is used, further parameters can be specified by using the extended class fpcCtrlMbc.

fpcCtrl

A control object of class fpcCtrl. Only used for eigenbasis calculation (baseType="eigenbasis" in funcyCtrl).

parallel

If TRUE, package parallel is used for parallel computing.

save.data

Save a copy of the data in the return object? Must be set to TRUE in order to use plot function plot.

…

Additional optional model specific parameters. Works only if exactly one method is called in methods. The parameters are the following:

"fitfclust"


p:: Rank of the covariance matrix $Γ$ , must be at least dimBase.
pert:: Adds a ridge term to the least squares fit, helps if only few observations per curve were registered.

"distclust"


method:: One of "hclust" or "pam" specifying how distance matrix is processed.

"iterSubspace"


simplif:: FALSE, if curve affiliation is tested again by projecting the curve onto the current subspace created without the current curve (leave-one-out-curve-estimation).

"funclust"


nbInit:: The number of small-EM used to determine the initialization of the main EM-like algorithm.
nbIterInit:: The maximum number of iterations for each small-EM.

"funHDDCWrapper"


model:: The chosen model among "AkjBkQkDk", "AkjBQkDk", "AkBkQkDk","AkBQkDk","ABkQkDk","ABQkDk". See (Bouveyron & Jacques, 2011) for details.

"fscm"


location:: A two-dimensional matrix of the curve locations (coordinates).
knn:: Number of neighbors each curve depends on.
useCode:: "R" or "C". If C is installed, a lot faster than R.
verbose:: TRUE, if number of iterations and sigma, theta and f are to be printed.

"waveclust"


gamma:: One of "group", "scale.location", "group.scale.location" or "constant".
init:: One of "rEM" or "SEM" for random or stochastic EM.
plotLoglik:: TRUE, if log-likelihood is to be plotted.

Value

Returns an object of class funcyOutList.

Details

funcit is the core function to execute one or more methods to cluster functional data. Functional data can be measured on a regular or on an irregular grid. While for regular datasets, all curves are measured on the same time points, for irregular datasets, number or/and location of time points can differ (see formatFuncy for different formats). Only algorithms "fitfclust","distclust" and "iterSubspace" are applicable to irregular datasets. All methods are based on the projection of the curves onto a basis defined in funcyCtrl and building mixed effects models of the basis coefficients.

References

Christina Yassouridis and Dominik Ernst and Friedrich Leisch. Generalization, Combination and Extension of Functional Clustering Algorithms: The R Package funcy. Journal of Statistical Software. 85 (9). 1--25. 2018

"fitfclust":: Gareth James and Catherine A. Sugar. Clustering for Sparsely Sampled Functional Data. Journal of the American Statistical Association. 98 (462). 297--408. 2003
"distclust":: Jie Peng and Hans-Georg Mueller. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics. 2 (3). 1056--1077, 2008
"iterSubspace":: Chiou Jeng-Min and Pai-Ling Li. Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society: Series B. 69 (4). 679--699. 2007
"waveclust":: Madison Giacofci and Sophie Lambert-Lacroix and Guillemette Marot and Franck Picard. Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics. 69. 31--40. 2011
"fscm":: Nicoleta Serban and Huijing Jiang.Clustering Random Curves Under Spatial Interdependence With Application to Service Accessibility. Technometrics. 54 (2). 108--119. 2012
"funclust":: Julien Jacques and Cristian Preda. Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing. 112. 164<U+2013>171. 2013
"funHDDC":: Charles Bouveyron and Julien and Jacques. Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification. 5 (4). 281--300. 2011

Examples

Run this code

# NOT RUN {
##Cluster the data with methods for regular sets
##Sample a regular dataset
set.seed(2804)
ds <- sampleFuncy(obsNr=50, k=4, timeNr=8, reg=TRUE)

##Cluster the functions with all available methods. 
res <- funcit(data=Data(ds), clusters=Cluster(ds),
              methods=c(1,2,3), seed=2404,
              k=4)
summary(res)
Cluster(res)

##Additional method specific parameters for method fitfclust
res <- funcit(data=Data(ds), clusters=Cluster(ds), methods="fitfclust", seed=2405,
              k=4, p=5, pert=0)


##Cluster the data with methods for irregular sets
##Sample an irregular dataset
set.seed(2804)
ds <- sampleFuncy(obsNr=50, k=4, timeNrMin=4, timeNrMax=8,
                  reg=FALSE)
data <- Data(ds)
clusters <- Cluster(ds)

res <- funcit(data=data, clusters=clusters,
              methods=c("fitfclust","distclust", "iterSubspace"), seed=2406,
              k=4, parallel=TRUE)

summary(res)
Cluster(res)
plot(res)

##Two reallife examples
# }
# NOT RUN {
data("genes")
data <- genes$data
clusters <- genes$clusters

##Cluster the functions with all available methods. 
res <- funcit(data=data, clusters=clusters,
              methods=c(1:7)[-4], seed=2404,
              k=4)
summary(res)
Cluster(res)
# }
# NOT RUN {
# }
# NOT RUN {
data("electricity")
res <- funcit(data=electricity, methods=c("fitfclust","distclust",
"waveclust"), seed=2406, k=5, parallel=TRUE)
plot(res, legendPlace="topleft")
# }

Run the code above in your browser using DataLab