dissrep(diss, criterion="density",
score=NULL, decreasing=TRUE,
trep=0.25, nrep=NULL, tsim=0.1, dmax=NULL, weights=NULL)
dist
object (see dist
)"freq"
(frequency), "density"
(neighborhood density) or "dist"
(centrality). An optional
vector containing the scores for sorting the candidatsim
.NULL
(default),
trep
argument is used to control the size of the
representative set.dmax
. Defaults to 0.1 (10%). Object $y$ is
redundant to object $x$ when it is in the neighborhood of $x$, i.e., within a distance
tsim*dmax
from $x$.tsim*dmax
. If NULL
, the value of dmax
is derived from the dissimilarity matrix.NULL
, equal weights are assigned.diss.rep
. This is a vector containing
the indexes of the representative objects with the following additional attributes:score
argument, or by specifying one of the following as criterion
argument: "freq"
(sequence frequency), "density"
(neighborhood density), "dist"
(centrality).
The frequency criterion uses the frequencies as
representativeness score. The frequency of an object in the data is
computed as the number of other objects with whom the dissimilarity
is equal to 0. The more frequent an object the more representative it
is supposed to be. Hence, objects are sorted in decreasing frequency
order. Indeed, this criterion is the neighborhood (see below)
criterion with the neighborhood diameter set to 0.
The neighborhood density is the
number---density---of sequences in the neighborhood of the object. This requires to set the neighborhood radius tsim
. Objects are
sorted in decreasing density order.
The centrality criterion is the sum of distances to all other objects. The
smallest the sum, the most representative the sequence.
Use criterion="dist"
and nrep=1
to get the medoid and criterion="density"
and nrep=1
to get the densest object pattern.
For more details, see Gabadinho et al., 2011.seqrep
, disscenter
## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)
## Computing the distance matrix
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", sm=costs)
## Representative set using the neighborhood density criterion
biofam.rep <- dissrep(biofam.om)
biofam.rep
summary(biofam.rep)
Run the code above in your browser using DataLab