
Compute probabilistic divergence between two PST
# S4 method for PSTf,PSTf
pdist(x,y, method="cp", l, ns=5000, symetric=FALSE, output="all")
If ouput="all"
, a vector containing the divergence value for each generated sequence, if output="mean"
, the mean, i.e. expected value which is the divergence between models.
a probabilistic suffix tree, i.e., an object of class "PSTf"
as returned by the pstree
, prune
or tune
function.
a probabilistic suffix tree, i.e., an object of class "PSTf"
as returned by the pstree
, prune
or tune
function.
character. Method for computing distances. So far only one method is available.
integer. Length of the sequence(s) to generate.
integer. Number sequences to generate.
logical. If TRUE
, the symetric version of the measure is returned, see details.
character. See value
.
Alexis gabadinho
The function computes a probabilistic divergence measure between PST
As the number
The pdist
function uses the following procedure to compute the divergence between two PST:
generate a ransom sample of generate
method
predict the sequences with
compute
the expected value
For more details, see Gabadinho 2016.
Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.
Juang, B. H. and Rabiner, L. R. (1985). A probabilistic distance measure for hidden Markov models. ATT Technical Journal, 64(2), pp. 391-408.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), pp. 257-286.
## activity calendar for year 2000
## from the Swiss Household Panel
## see ?actcal
data(actcal)
## selecting individuals aged 20 to 59
actcal <- actcal[actcal$age00>=20 & actcal$age00 <60,]
## defining a sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)
## building a PST segmented by age group
gage10 <- cut(actcal$age00, c(20,30,40,50,60), right=FALSE,
labels=c("20-29","30-39", "40-49", "50-59"))
actcal.pstg <- pstree(actcal.seq, nmin=2, ymin=0.001, group=gage10)
## pruning
C99 <- qchisq(0.99,4-1)/2
actcal.pstg.opt <- prune(actcal.pstg, gain="G2", C=C99)
## extracting PST for age group 20-39 and 30-39
g1.pst <- subtree(actcal.pstg.opt, group=1)
g2.pst <- subtree(actcal.pstg.opt, group=2)
## generating 5000 sequences with g1.pst
## and computing 5000 distances
dist.g1_g2 <- pdist(g1.pst, g2.pst, l=11)
hist(dist.g1_g2)
## the probabilistic distance is the mean
## of the 5000 distances
mean(dist.g1_g2)
Run the code above in your browser using DataLab