## S3 method for class 'stslist':
pstree(object, group, L, cdata=NULL, stationary=TRUE,
nmin = 1, ymin=NULL, weighted = TRUE, with.missing = FALSE)
'stslist'
as created by TraMineR seqdef
function."PSTf "
.nmin
argument specifies the minimum frequency of a subsequence required to add it to te tree.
Each node of the tree is labelled with a context $c$ and stores the next symbol empirical probability distribution $\hat{P}(\sigma|c), \; \sigma \in A$, where $A$ is an alphabet of finite size. The root node labelled with the empty string $e$ stores the $0th$ order probability $\hat{P}(\sigma), \; \sigma \in A$ of oberving each symbol of the alphabet in the whole learning sample.
The building algorithm calls the cprob
function which returns the empirical next symbol counts observed after each context $c$ and computes the corresponding empirical probability distribution. Each node in the tree is connected to its longest suffix, where the longest suffix of a string $c=c_{1},c_{2}, \ldots, c_{k}$ of length $k$ is $suffix(c)=c_{2}, \ldots, c_{k}$.
Once an initial PST is built it can be pruned to reduce its complexity by removing nodes that do not provide significant information (see prune
). A model selection procedure based on information criteria is also available (see tune
).Ron, D.; Singer, Y. & Tishby, N. The power of amnesia: Learning probabilistic automata with variable memory length Machine Learning, 1996, 25, 117-149
Bejerano, G. & Yona, G. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics, 2001, 17, 23-43
prune
, tune
## Build a PST on one single sequence
data(s1)
s1.seq <- seqdef(s1)
s1.seq
S1 <- pstree(s1.seq, L = 3)
print(S1, digits = 3)
S1
Run the code above in your browser using DataLab