learnPattern: Learn Local Auto-Patterns for Time Series Representation and Similarity

Description

learnPattern implements ensemble of regression trees (based on Breiman and Cutler's original Fortran code) to learn local auto-patterns for time series representation. Ensemble of regression trees are used to learn an autoregressive model. A local time-varying autoregressive behavior is learned by the ensemble.

Usage

"learnPattern"(x, segment.factor=c(0.05,0.95), random.seg=TRUE, target.diff=TRUE, segment.diff=TRUE,  random.split=0, ntree=200, mtry=1, replace=FALSE, sampsize=if (replace) ceiling(0.632*nrow(x)) else nrow(x), maxdepth=6, nodesize=5, do.trace=FALSE, keep.forest=TRUE, oob.pred=FALSE, keep.errors=FALSE,  keep.inbag=FALSE, ...)
"print"(x, ...)

Arguments

time series database as a matrix in UCR format. Rows are univariate time series, columns are observations (for the print method, a learnPattern object).

segment.factor

The proportion of the time series length to be used for both predictors and targets, if random.seg is TRUE (default), minimum and maximum factor should be provided as array of length two.

random.seg

TRUE if segment length is random between thresholds defined by segment.factor

target.diff

Can target segment be a difference feature?

segment.diff

Can predictor segments be difference feature?

random.split

Type of the split. If set to zero (0), splits are generated based on decrease in SSE in target segment Setting of one (1) generates the split value randomly between max and min values. Setting of two (2) generates a kd-tree type of split (i.e. median of the values at each node is chosen as the split).

ntree

Number of trees to grow. Larger number of trees are preferred if there is no concern regarding the computation time.

mtry

Number of predictor segments randomly sampled as candidates at each split. Note that it is preset to 1 for now.

replace

Should bagging of time series be done with replacement? All training time series are used if FALSE (default).

sampsize

Size(s) of sample to draw with replacement if replace is set to TRUE

maxdepth

The maximum depth of the trees in the ensemble.

nodesize

Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time).

do.trace

If set to TRUE, give a more verbose output as learnPattern is run. If set to some integer, then running output is printed for every do.trace trees.

keep.forest

If set to FALSE, the forest will not be retained in the output object.

oob.pred

if replace is set to TRUE, predictions for the time series observations are returned.

keep.errors

If set to TRUE, the mean square error (MSE) of target prediction over target segments is evaluated for each tree. If oob.pred=TRUE, this information is evaluated on ``out-of-bag'' samples at each tree.

keep.inbag

Should an n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees

...

optional parameters to be passed to the low level function learnPattern.

Value

call: the original call to learnPattern.
type: regression
segment.factor: the proportion of the time series length to be used for both predictors and targets.
segment.length: used segment length settings by the trees of ensemble
nobs: number of observations in a segment
ntree: number of trees grown
maxdepth: maximum depth level for each tree
mtry: number of predictor segments sampled for spliting at each node.
target: starting time of the target segment for each tree.
target.type: type of the target segment; 1 if observed series, 2 if difference series.
forest: a list that contains the entire forest; NULL if keep.forest=FALSE.
oobprediction: predicted observations based on ``out-of-bag'' time series are returned if oob.pred=TRUE
ooberrors: Mean square error (MSE) over the trees evaluated using the predicted observations on ``out-of-bag'' time series is returned if oob.pred=TRUE.
inbag: n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees if keep.inbag=TRUE
errors: Mean square error (MSE) of target prediction over target segments for each tree. If oob.pred=TRUE, Mean square error (MSE) is reported based on ``out-of-bag'' samples at each tree.

References

Baydogan, M. G. (2013), ``Learned Pattern Similarity``, Homepage: http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html. Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Examples

Run this code

data(GunPoint)
set.seed(71)

## Learn patterns on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries)
print(ensemble)

## Find the similarity between test and training series based on the learned model
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)

## Find the index of 1 nearest neighbor (1NN) training series for each test series
NearestNeighbor=apply(similarity,1,which.min)

## Predicted class for each test series
predicted=GunPoint$trainclass[NearestNeighbor]

## Compute the percentage of accurate predictions
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns randomly on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries, random.split=1)

## Find the similarity between test and training series and classify test series
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns by training each tree on a random subsample
## and classify test time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE)
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
print(predicted)

## Learn patterns and do predictions on OOB time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE,target.diff=FALSE,oob.pred=TRUE)
## Plot first series and its OOB approximation
plot(GunPoint$trainseries[1,],xlab='Time',ylab='Observation',
	type='l',lty=1,lwd=2)
points(c(1:ncol(GunPoint$trainseries)),ensemble$oobpredictions[1,],
	type='l',col=2,lty=2,lwd=2)
legend('topleft',c('Original series','Approximation'),
	col=c(1,2),lty=c(1,2),lwd=2)

Run the code above in your browser using DataLab