Learn R Programming

TraMineR (version 1.1)

disstree: Dissimilarity Tree

Description

Analyse non-measurable objects described through a set of dissimilarity by recursively partionning the population.

Usage

disstree(formula, data= NULL, minSize = 0.05,maxdepth = 5,
   R = 1000, pval = 0.01)

Arguments

formula
A formula where de left hand side is a dissimilarity matrix, the right hand side should be a list of candidate variable to partion the population
data
a data.frame where arguments in formula can be identified
minSize
minimum number of observation in a node, in percentage if less than 1.
maxdepth
maximum depth of the tree
R
Number of permutation used to assess significativity of a partition.
pval
Maximum p-value, in percent

Value

  • Return an object of class disstree, a list with the following component:
  • nodeA tree object (see below)
  • adjustementglobal adjustement of the tree
  • The node object is a list with the following component:
  • splitChoosen predictor, NULL for terminal nodes
  • vardisNode pseudo variance, see dissvar
  • childrenChild node, NULL for terminal nodes
  • indIndex of individuals in this node
  • depthDepth of the node, starting from root node
  • labelLabel of this node
  • R2R squared of the split, NULL for terminal nodes

encoding

latin1

Details

At each step, this procedure choose the variable that explains the biggest part of the pseudo variance to partition the population. It assess the significance of the choosen variable by performing a permutation test.

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. M�ller (2009). Analyse de dissimilarit�s par arbre d'induction. Revue des Nouvelles Technologies de l'Information, EGC'2009. Batagelj, V. (1988). Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, pp. 67-74. North-Holland, Amsterdam. Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46. Piccarreta, R. et F. C. Billari (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061-1078.

See Also

dissvar to compute pseudo variance using dissimilarities and for a basic introduction to concepts of pseudo variance analysis dissassoc to test association between dissimilarity and another variable dissreg to analyse dissimilarities in a way close to linear regression disscenter to compute the distance of each object to its center of group using dissimilarities

Examples

Run this code
data(mvad)

## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Building dissimilarities
mvad.lcs <- seqdist(mvad.seq, method="LCS")
dt <- disstree(mvad.lcs~ male + Grammar + funemp + gcse5eq + fmpr + livboth, 
    data=mvad, R = 10000)
print(dt)

## Compute quality of the tree
print(dissassoc(mvad.lcs, disstreeleaf(dt), R=1))
  
## Using simplified interface to generate a file for GraphViz
seqtree2dot(dt, "mvadseqtree", seqs=mvad.seq, plottype="seqdplot", 
	border=NA, withlegend=FALSE)

## Generating a file for GraphViz
disstree2dot(dt, "mvadtree", imagefunc=seqdplot, imagedata=mvad.seq, 
	## Additionnal parameters passed to seqdplot
	withlegend=FALSE, axes=FALSE)
  
## Second method, using a specific function
myplotfunction <- function(individuals, seqs, mds,...) {
    par(font.sub=2, mar=c(3,0,6,0), mgp=c(0,0,0))

	## using mds to order sequence in seqiplot
	mds <- cmdscale(seqdist(seqs[individuals,], method="LCS"),k=1)
	seqiplot(seqs[individuals,], sortv=mds,...)
	}
 
## Generating a file for GraphViz
## If imagedata is not set, index of individuals are sent to imagefunc
disstree2dot(dt, "mvadtree", imagefunc=myplotfunction, title.cex=3,
	## additionnal parameters passed to myplotfunction
	seqs=mvad.seq, mds=mvad.mds,
	## additionnal parameters passed to seqiplot (through myplotfunction)
	withlegend=FALSE, axes=FALSE,tlim=0,space=0, ylab="", border=NA)

## To run GraphViz (dot) from R
## shell("dot -Tsvg -O mvadtree.dot")

Run the code above in your browser using DataLab