## S3 method for class 'PSTf,stslist':
pmine(object, data, l, pmin, pmax, lag, state, average=FALSE, output="sequences")
seqdef
function of the TraMineR
library.lag
first states in the sequence are omitted.TRUE
, the pmin
or pmax
probability is supposed to be the average state probability in the (sub)sequences. If FALSE
(sub)sequences having every state probability less than pmax
oroutput='sequences'
the whole sequence(s) where the user defined criteria is satisfied are returned. If output='patterns'
only the (sub)sequences satisfying the user defined criteria are returned.stslist
The pmine
function allows for advanced pattern mining with user defined parameters. It is controlled by the lag
and pmin
arguments. For example, by setting lag=2
and pmin=0.40
(example 1), we select all sequences with average (the geometric mean is used) state probability from position $lag+1, \ldots, \ell$ above pmin
. Instead of considering the average state probability at positions $lag+1, \ldots, \ell$, it is also possible to select frequent patterns that do not contain any state with probability below the threshold. This prevents from selecting sequences having many states with high probability but one ore several states with a low probability.
It is also possible to mine the sequence data for frequent patterns of length $\ell_{j} < \ell$, regardless of the position in the sequence where they occur. By using the output="patterns"
argument, the pmine
function returns the patterns (as a sequence object) instead of the whole set of distinct sequences containing the patterns. Since the probability of a pattern can be different depending on the context (previous states) the returned subsequences also contain the context preceding the pattern.
cmine
for context mining## activity calendar for year 2000
## from the Swiss Household Panel
## see ?actcal
data(actcal)
## selecting individuals aged 20 to 59
actcal <- actcal[actcal$age00>=20 & actcal$age00 <60,]
## defining a sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)
## building a PST
actcal.pst <- pstree(actcal.seq, nmin=2, ymin=0.001)
## pruning
## Cut-offs for 5% and 1% (see ?prune)
C99 <- qchisq(0.99,4-1)/2
actcal.pst.C99 <- prune(actcal.pst, gain="G2", C=C99)
## example 1
pmine(actcal.pst.C99, actcal.seq, pmin=0.4, lag=2)
## example 2: patterns of length 6 having p>=0.6
pmine(actcal.pst.C99, actcal.seq, pmin=0.6, l=6)
Run the code above in your browser using DataLab