- seqdata
State sequence object of class stslist.
The sequence data to use.
Use seqdef to create such an object.
- method
String.
The dissimilarity measure to use.
It can be "OM", "OMloc", "OMslen", "OMspell",
"OMstran", "HAM", "DHD", "CHI2", "EUCLID",
"LCS", "LCP", "RLCP", "NMS", "NMSMST",
"SVRspell", or "TWED". See the Details section.
- refseq
NULL, Integer, State Sequence Object, or List.
Default: NULL.
The baseline sequence to compute the distances from.
When an integer, the index of a sequence in seqdata or 0 for the most frequent sequence.
When a state sequence object, it must contain a single sequence and have the same
alphabet as seqdata.
When a list, it must be a list of two sets of indexes of seqdata rows.
- norm
String.
Default: "none".
The normalization to use when method is one of "OM",
"OMloc", "OMslen", "OMspell",
"OMstran", "TWED", "HAM", "DHD", "LCS",
"LCP", "RLCP", "CHI2", "EUCLID".
It can be "none", "auto", or, except for
"CHI2" and "EUCLID", "maxlength",
"gmean", "maxdist", or "YujianBo". "auto" is
equivalent to "maxlength" when method is one of "OM",
"HAM", or "DHD", to "gmean" when method is one
of "LCS", "LCP", or "RLCP", to YujianBo when
method is one of "OMloc", "OMslen", "OMspell",
"OMstran", "TWED". See the Details section.
- indel
Double, Vector of Doubles, or String.
Default: "auto".
Insertion/deletion cost(s). Applies when method is one of "OM", "OMslen", "OMspell",
or "OMstran".
The single state-independent insertion/deletion cost when a double.
The state-dependent insertion/deletion costs when a vector of doubles.
The vector should contain an indel cost by state in the order of the alphabet.
When "auto", the indel is set as max(sm)/2 when sm is
a matrix and is computed by means of seqcost when sm is
a string specifying a cost method.
- sm
NULL, Matrix, Array, or String. Substitution costs.
Default: NULL.
The substitution-cost matrix when a matrix and method is one of
"OM", "OMloc", "OMslen", "OMspell",
"OMstran", "HAM", or "TWED".
The series of the substitution-cost matrices when an array and
method = "DHD". They are grouped in a 3-dimensional array with the
third index referring to the position in the sequence.
One of the strings "CONSTANT", "INDELS", "INDELSLOG",
or "TRATE". Designates a seqcost method
to build sm. "CONSTANT" is not relevant for "DHD".
sm is mandatory when method is one of "OM",
"OMloc", "OMslen", "OMspell", "OMstran",
or "TWED".
sm is autogenerated when method is one of "HAM" or
"DHD" and sm = NULL. See the Details section.
Note: With method = "NMS" or method = "SVRspell", use
prox instead.
- with.missing
Logical.
Default: FALSE.
Should the non-deleted missing value be added to the alphabet as an additional
state? If FALSE and seqdata or refseq contains such
gaps, an error is raised.
- full.matrix
Logical.
Default: TRUE.
When refseq = NULL, if TRUE, the full distance matrix is
returned, if FALSE, an object of class dist is returned,
that is, a vector containing only values from the lower triangle of the
distance matrix. Objects of class dist are smaller and can be passed
directly as arguments to most clustering functions.
- kweights
Double or vector of doubles.
Default: vector of 1s.
The weights applied to subsequences when method is one of "NMS",
"NMSMST", or "SVRspell". It contains at position \(k\) the
weight applied to the subsequences of length \(k\). It must be positive.
Its length should be equal to the number of columns of seqdata. If shorter,
longer subsequences are ignored. If a scalar, it is transformed into
rep(kweights,ncol(sedata)).
- tpow
Double.
Default: 1.0.
The exponential weight of spell length when method is one of
"OMspell", "NMSMST", or "SVRspell".
- expcost
Double.
Default: 0.5.
The cost of spell length transformation when method = "OMloc" or
method = "OMspell". It must be positive. The exact interpretation is
distance-dependent.
- context
Double.
Default: 1-2*expcost.
The cost of local insertion when method = "OMloc". It must be positive.
- link
String.
Default: "mean".
The function used to compute substitution costs when method = "OMslen".
One of "mean" (arithmetic average) or "gmean" (geometric mean
as in the original proposition of Halpin 2010).
- h
Double.
Default: 0.5.
It must be greater than or equal to 0.
The exponential weight of spell length when method = "OMslen".
The gap penalty when method = "TWED". It corresponds to the lambda
in Halpin (2014), p 88. It is usually chosen in the range [0,1]
- nu
Double.
Stiffness when method = "TWED". It must be strictly greater than 0
and is usually less than 1.
See Halpin (2014), p 88.
- transindel
String.
Default: "constant".
Method for computing transition indel costs when method = "OMstran".
One of "constant" (single indel of 1.0), "subcost" (based on
substitution costs), or "prob" (based on transition probabilities).
- otto
Double.
The origin-transition trade-off weight when method = "OMstran". It
must be in [0, 1].
- previous
Logical.
Default: FALSE.
When method = "OMstran", should we also account for the transition
from the previous state?
- add.column
Logical.
Default: TRUE.
When method = "OMstran", should the last column (and also the first
column when previous = TRUE) be duplicated? When sequences have different
lengths, should the last (first) valid state be duplicated.
- breaks
List of ordered pairs of integers.
Default: NULL.
The list of the possibly overlapping intervals when method = "CHI2"
or method = "EUCLID". Each interval is defined by the pair c(t1,t2) of the start t1 and end t2 positions of the interval.
- step
Integer.
Default: 1.
The length of the intervals when method = "CHI2" or
method = "EUCLID" and breaks = NULL. It must be positive.
It must also be even when overlap = TRUE.
- overlap
Logical.
Default: FALSE.
When method = "CHI2" or method = "EUCLID" and
breaks = NULL, should the intervals overlap?
- weighted
Logical.
Default: TRUE.
When method is "CHI2" or when sm is a string (method),
should the distributions of the states account for the sequence weights
in seqdata? See seqdef.
- global.pdotj
Numerical vector, "obs", or NULL.
Default: NULL.
Only for method = "CHI2".
The vector of state proportions to be used as marginal distribution. When NULL, the state distribution on the corresponding interval is used. When "obs", the overall state distribution in seqdata is used for all intervals. When a vector of proportions, it is used as marginal distribution for all intervals.
- prox
NULL or Matrix.
Default: NULL.
The matrix of state proximities when method = "NMS" or
method = "SVRspell".
- check.max.size
Logical. Should seqdist stop when maximum allowed number of unique sequences is exceeded? Caution, setting FALSE may produce unexpected results or even crash R.
- opt.args
List. List of additional non-documented arguments for development usage.