Learn R Programming

randomUniformForest (version 1.1.2)

update.unsupervised: Update Unsupervised Learning object

Description

Update unsupervised learning object with new data in order to achieve incremental learning. New MDS points are predicted with new data and learning of MDS points of the former unsupervised object.

Usage

## S3 method for class 'unsupervised':
update(object, X, 
	oldData = NULL, 
	mapAndReduce = FALSE, 
	updateModel = FALSE, 
	\dots)

Arguments

Value

  • An object of class unsupervised, which is a list with the following components:
  • proximityMatrixthe resulted dissimilarity matrix.
  • MDSModelthe resulted Multidimensional scaling model.
  • unsupervisedModelthe resulted unsupervised model with clustered observations in unsupervisedModel$cluster.
  • largeDataLearningModelif the dataset is large, the resulted model that learned a sample of the MDS points, and predicted others points.
  • gapStatisticsif K-means algorithm has been called, the results of the gap statistic. Otherwise NULL.
  • rUFObjectRandom Uniform Forests object.
  • nbClustersNumber of clusters found.
  • paramsoptions of the model.

See Also

combine.unsupervised, modifyClusters, mergeClusters, clusteringObservations, as.supervised

Examples

Run this code
## not run
## Water Treatment Plant Data Set
## Data can be download at https://archive.ics.uci.edu/ml/datasets/Water+Treatment+Plant

# URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/water-treatment/"
# dataset = "water-treatment.data"

# X = read.table(paste(URL, dataset, sep=""), sep = ",")

## 1- Preprocessing
## first, look at the first column and format date
#  Dates = rm.string(as.character(X[,1]), "D-")
#  DatesAsStringTable = do.call(rbind, strsplit(Dates, "/"))
#  DatesasNumericTable = t(apply(DatesAsStringTable, 1, as.numeric))

##  Then, transform data as a R matrix and add new dates
#  XX = as.true.matrix(X)[,-1]
#  XX = cbind(DatesasNumericTable, XX)
#  colnames(XX)[1:3] = c("day", "month", "year")

# Look the new data
# head(XX)
# str(XX)

## and fill missing values,
## X.imputed = fillNA2.randomUniformForest(XX)

## 2 - run unsupervised analysis on the first half of dataset 
##
# subset.1 = 1:floor(nrow(X.imputed)/2)
# WaterTreatment.model.1 = unsupervised.randomUniformForest(X.imputed, subset = subset.1, 
# baseModel = "proximityThenDistance", seed = 2014)

## assess roughly the model and visualize
#  WaterTreatment.model.1

## 3 - update model with the second half of dataset
# WaterTreatment.updated = update.unsupervised(WaterTreatment.model.1, 
# X.imputed[-subset.1,], oldData = X.imputed[-subset.1,])

# WaterTreatment.updated

## view how MDS points have been learned :
## first component
# WaterTreatment.updated$largeDataLearningModel[[1]]

# second component
# WaterTreatment.updated$largeDataLearningModel[[2]]

Run the code above in your browser using DataLab