disstree: Dissimilarity Tree

Description

Analyse non-measurable objects described through a set of dissimilarity by recursively partionning the population.

Usage

disstree(formula, data= NULL, minSize = 0.05,maxdepth = 5,
   R = 1000, pval = 0.01)

Arguments

formula

A formula where de left hand side is a dissimilarity matrix, the right hand side should be a list of candidate variable to partion the population

data

a data.frame where arguments in formula can be identified

minSize

minimum number of observation in a node, in percentage if less than 1.

maxdepth

maximum depth of the tree

Number of permutation used to assess significativity of a partition.

pval

Maximum p-value, in percent

Value

Return an object of class disstree, a list with the following component:
nodeA tree object (see below)
adjustementglobal adjustement of the tree
The node object is a list with the following component:
splitChoosen predictor, NULL for terminal nodes
vardisNode pseudo variance, see dissvar
childrenChild node, NULL for terminal nodes
indIndex of individuals in this node
depthDepth of the node, starting from root node
labelLabel of this node
R2R squared of the split, NULL for terminal nodes

encoding

latin1

Details

At each step, this procedure choose the variable that explains the biggest part of the pseudo variance to partition the population. It assess the significance of the choosen variable by performing a permutation test.

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. M�ller (2009). Analyse de dissimilarit�s par arbre d'induction. Revue des Nouvelles Technologies de l'Information, EGC'2009. Batagelj, V. (1988). Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, pp. 67-74. North-Holland, Amsterdam. Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46. Piccarreta, R. et F. C. Billari (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061-1078.

Examples

Run this code

data(mvad)

## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Building dissimilarities
mvad.lcs <- seqdist(mvad.seq, method="LCS")
dt <- disstree(mvad.lcs~ male + Grammar + funemp + gcse5eq + fmpr + livboth, 
    data=mvad, R = 10000)
print(dt)

## Compute quality of the tree
print(dissassoc(mvad.lcs, disstreeleaf(dt), R=1))
  
## Using simplified interface to generate a file for GraphViz
seqtree2dot(dt, "mvadseqtree", seqs=mvad.seq, plottype="seqdplot", 
	border=NA, withlegend=FALSE)

## Generating a file for GraphViz
disstree2dot(dt, "mvadtree", imagefunc=seqdplot, imagedata=mvad.seq, 
	## Additionnal parameters passed to seqdplot
	withlegend=FALSE, axes=FALSE)
  
## Second method, using a specific function
myplotfunction <- function(individuals, seqs, mds,...) {
    par(font.sub=2, mar=c(3,0,6,0), mgp=c(0,0,0))

	## using mds to order sequence in seqiplot
	mds <- cmdscale(seqdist(seqs[individuals,], method="LCS"),k=1)
	seqiplot(seqs[individuals,], sortv=mds,...)
	}
 
## Generating a file for GraphViz
## If imagedata is not set, index of individuals are sent to imagefunc
disstree2dot(dt, "mvadtree", imagefunc=myplotfunction, title.cex=3,
	## additionnal parameters passed to myplotfunction
	seqs=mvad.seq, mds=mvad.mds,
	## additionnal parameters passed to seqiplot (through myplotfunction)
	withlegend=FALSE, axes=FALSE,tlim=0,space=0, ylab="", border=NA)

## To run GraphViz (dot) from R
## shell("dot -Tsvg -O mvadtree.dot")

Run the code above in your browser using DataLab