Learn R Programming

LPStimeSeries (version 1.0-5)

learnPattern: Learn Local Auto-Patterns for Time Series Representation and Similarity

Description

learnPattern implements ensemble of regression trees (based on Breiman and Cutler's original Fortran code) to learn local auto-patterns for time series representation. Ensemble of regression trees are used to learn an autoregressive model. A local time-varying autoregressive behavior is learned by the ensemble.

Usage

"learnPattern"(x, segment.factor=c(0.05,0.95), random.seg=TRUE, target.diff=TRUE, segment.diff=TRUE, random.split=0, ntree=200, mtry=1, replace=FALSE, sampsize=if (replace) ceiling(0.632*nrow(x)) else nrow(x), maxdepth=6, nodesize=5, do.trace=FALSE, keep.forest=TRUE, oob.pred=FALSE, keep.errors=FALSE, keep.inbag=FALSE, ...) "print"(x, ...)

Arguments

x
time series database as a matrix in UCR format. Rows are univariate time series, columns are observations (for the print method, a learnPattern object).
segment.factor
The proportion of the time series length to be used for both predictors and targets, if random.seg is TRUE (default), minimum and maximum factor should be provided as array of length two.
random.seg
TRUE if segment length is random between thresholds defined by segment.factor
target.diff
Can target segment be a difference feature?
segment.diff
Can predictor segments be difference feature?
random.split
Type of the split. If set to zero (0), splits are generated based on decrease in SSE in target segment Setting of one (1) generates the split value randomly between max and min values. Setting of two (2) generates a kd-tree type of split (i.e. median of the values at each node is chosen as the split).
ntree
Number of trees to grow. Larger number of trees are preferred if there is no concern regarding the computation time.
mtry
Number of predictor segments randomly sampled as candidates at each split. Note that it is preset to 1 for now.
replace
Should bagging of time series be done with replacement? All training time series are used if FALSE (default).
sampsize
Size(s) of sample to draw with replacement if replace is set to TRUE
maxdepth
The maximum depth of the trees in the ensemble.
nodesize
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time).
do.trace
If set to TRUE, give a more verbose output as learnPattern is run. If set to some integer, then running output is printed for every do.trace trees.
keep.forest
If set to FALSE, the forest will not be retained in the output object.
oob.pred
if replace is set to TRUE, predictions for the time series observations are returned.
keep.errors
If set to TRUE, the mean square error (MSE) of target prediction over target segments is evaluated for each tree. If oob.pred=TRUE, this information is evaluated on ``out-of-bag'' samples at each tree.
keep.inbag
Should an n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees
...
optional parameters to be passed to the low level function learnPattern.

Value

An object of class learnPattern, which is a list with the following components:
call
the original call to learnPattern.
type
regression
segment.factor
the proportion of the time series length to be used for both predictors and targets.
segment.length
used segment length settings by the trees of ensemble
nobs
number of observations in a segment
ntree
number of trees grown
maxdepth
maximum depth level for each tree
mtry
number of predictor segments sampled for spliting at each node.
target
starting time of the target segment for each tree.
target.type
type of the target segment; 1 if observed series, 2 if difference series.
forest
a list that contains the entire forest; NULL if keep.forest=FALSE.
oobprediction
predicted observations based on ``out-of-bag'' time series are returned if oob.pred=TRUE
ooberrors
Mean square error (MSE) over the trees evaluated using the predicted observations on ``out-of-bag'' time series is returned if oob.pred=TRUE.
inbag
n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees if keep.inbag=TRUE
errors
Mean square error (MSE) of target prediction over target segments for each tree. If oob.pred=TRUE, Mean square error (MSE) is reported based on ``out-of-bag'' samples at each tree.

References

Baydogan, M. G. (2013), ``Learned Pattern Similarity``, Homepage: http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html. Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

See Also

predict.learnPattern, computeSimilarity, tunelearnPattern

Examples

Run this code
data(GunPoint)
set.seed(71)

## Learn patterns on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries)
print(ensemble)

## Find the similarity between test and training series based on the learned model
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)

## Find the index of 1 nearest neighbor (1NN) training series for each test series
NearestNeighbor=apply(similarity,1,which.min)

## Predicted class for each test series
predicted=GunPoint$trainclass[NearestNeighbor]

## Compute the percentage of accurate predictions
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns randomly on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries, random.split=1)

## Find the similarity between test and training series and classify test series
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns by training each tree on a random subsample
## and classify test time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE)
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
print(predicted)

## Learn patterns and do predictions on OOB time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE,target.diff=FALSE,oob.pred=TRUE)
## Plot first series and its OOB approximation
plot(GunPoint$trainseries[1,],xlab='Time',ylab='Observation',
	type='l',lty=1,lwd=2)
points(c(1:ncol(GunPoint$trainseries)),ensemble$oobpredictions[1,],
	type='l',col=2,lty=2,lwd=2)
legend('topleft',c('Original series','Approximation'),
	col=c(1,2),lty=c(1,2),lwd=2)

Run the code above in your browser using DataLab