seqecmpgroup: Identifying discriminating subsequences

Description

Identify and sort the most discriminating subsequences by their discriminating power.

Usage

seqecmpgroup(subseq, group, method="chisq", pvalue.limit=NULL,
             weighted = TRUE)

Arguments

subseq

A subseqelist object (list of subsequences) such as produced by seqefsub

group

Group membership, i.e., a variable or factor defining the groups which we want to discriminate

method

The discrimination method; one of "bonferroni" or "chisq"

pvalue.limit

Can be used to filter the results. Only subsequences with a p-value lower than this parameter are selected. If NULL all subsequences are returned (regardless of their p-values).

weighted

Logical. If TRUE, seqecmpgroup uses the weights specified in subseq, (see seqefsub).

Value

An objet of type subseqelistchisq (subtype of subseqelist) with the following elements
subseqSorted list of found discriminating subsequences
seqeThe event sequence object on which the tests were computed
constraintTime constraints used for searching the subsequences (see seqeconstraint)
labelsLevels (value labels) of the target group variable
typeType of test used
dataA data frame with columns support, index (original order of the subsequence) and a pair of frequency and Pearson residual columns for each group

Details

The following discrimination test functions are implemented: chisq, the Pearson Independence Chi-squared test, and bonferroni, the Pearson Independence Chi-squared test with Bonferroni correction.

References

Studer, M., M�ller, N.S., Ritschard, G. & Gabadinho, A. (2010), "Classer, discriminer et visualiser des s�quences d'�v�nements", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 37-48.

Examples

Run this code

data(actcal.tse)
actcal.seqe <- seqecreate(actcal.tse)

##Searching for frequent subsequences, that is, appearing at least 20 times
fsubseq <- seqefsub(actcal.seqe, pMinSupport=0.01)

##searching for susbsequences discriminating the most men and women
data(actcal)
discr <- seqecmpgroup(fsubseq, group=actcal$sex, method="bonferroni")
##Printing discriminating subsequences
print(discr)
##Plotting the six most discriminating subsequences
plot(discr[1:6])

Run the code above in your browser using DataLab