This function constructs a transition network analysis (TNA) model for each group from a given sequence, wide-format dataframe or a mixture Markov model.
group_model(x, ...)# S3 method for default
group_model(
x,
group,
type = "relative",
scaling = character(0L),
groupwise = FALSE,
cols = tidyselect::everything(),
params = list(),
na.rm = TRUE,
...
)
# S3 method for mhmm
group_model(
x,
type = "relative",
scaling = character(0L),
groupwise = FALSE,
params = list(),
na.rm = TRUE,
...
)
# S3 method for tna_clustering
group_model(
x,
type = "relative",
scaling = character(0L),
groupwise = FALSE,
params = list(),
na.rm = TRUE,
...
)
group_tna(x, ...)
group_ftna(x, ...)
group_ctna(x, ...)
group_atna(x, ...)
An object of class group_tna
which is a list
containing one
element per cluster. Each element is a tna
object.
An stslist
object describing a sequence of events or states to
be used for building the Markov model. The argument x
also accepts
data.frame
objects in wide format, and tna_data
objects.
This can also be the output of clustering from
cluster_sequences()
.
Ignored.
A vector
indicating the group assignment of each
row of the data/sequence. Must have the same length as the number of
rows/sequences of x
. Alternatively, a single character
string giving
the column name of the data that defines the group when x
is a wide
format data.frame
or a tna_data
object. If not provided, each row of
the data forms a cluster. Not used when x
is a mixture Markov model
or a clustering result.
A character
string describing the weight matrix type.
Currently supports the following types:
"relative"
for relative frequencies (probabilities, the default)
"frequency"
for frequencies.
"co-occurrence"
for co-occurrences.
"n-gram"
for n-gram transitions. Captures higher-order transitions by
considering sequences of n states, useful for identifying longer
patterns.
"gap"
allows transitions between non-adjacent states, with
transitions weighted by the gap size.
"window"
creates transitions between all states within a
sliding window, capturing local relationships
(several sequences together).
"reverse"
considers the sequences in reverse order
(resulting in what is called a reply network in some contexts).
The resulting weight matrix is the transpose of the "frequency"
option.
"attention"
aggregates all downstream pairs of states with an
exponential decay for the gap between states. The parameter lambda
can be used to control the decay rate (the default is 1)-
A character
vector describing how to scale the weights
defined by type
. When a vector is provided, the scaling options are
applied in the respective order. For example, c("rank", "minmax")
would
first compute the ranks, then scale them to the unit interval using
min-max normalization. An empty vector corresponds to no scaling.
Currently supports the following options:
"minmax"
performs min-max normalization to scale the weights to the
unit interval. Note that if the smallest weight is positive, it
will be zero after scaling.
"max"
Multiplies the weights by the reciprocal of the largest weight
to scale the weights to the unit interval. This options preserves
positive ranks, unlike "minmax"
when all weights are positive.
"rank"
Computes the ranks of the weights using base::rank()
with
ties.method = "average"
.
A logical
value that indicates whether scaling methods
should be applied by group (TRUE
) or globally (FALSE
, the default).
An expression
giving a tidy selection of the
columns that should be considered as sequence data.
The default is all columns. The columns are
automatically determined for tna_data
objects. The group
column
is automatically removed from these columns if provided.
A list
of additional arguments for models of specific
type
. The potential elements of this list are:
n_gram
: An integer
for n-gram transitions specifying the number of
adjacent events. The default value is 2.
max_gap
: An integer
for the gap-allowed transitions specifying the
largest allowed gap size. The default is 1.
window_size
: An integer
for the sliding window transitions
specifying the window size. The default is 2.
weighted
: A logical
value. If TRUE
, the transitions
are weighted by the inverse of the sequence length. Can be used for
frequency, co-occurrence and reverse model types. The default is
FALSE
.
direction
: A character
string specifying the direction of attention
for models of type = "attention"
. The available options are
"backward"
, "forward"
, and "both"
, for backward attention,
forward attention, and bidirectional attention, respectively.
The default is "forward"
.
decay
: A function
that specifies the decay of the weights between
two time points at a specific distance. The function should take three
arguments: i
, j
and lambda
, where i
and j
are numeric
vectors of time values, and lambda
is a numeric
value for the
decay rate. The function should return a numeric
vector of weights.
The default is function(i, j, lambda) exp(-abs(i - j) / lambda)
.
lambda
: A numeric
value for the decay rate. The default is 1.
time
: A matrix
or a data.frame
providing the time values
for each sequence and at time index. For tna_data
objects, this can
also be a logical value, where TRUE
will use the time_data
element
of x
for the time values. Date
values are converted to numeric
.
duration
: A matrix
or a data.frame
providing the
time spent in each state for each sequence and time index.
This is an alternative to time
.
A logical
value that determines if observations with NA
value in group
be removed. If FALSE
, an additional category for NA
values will be added. The default is FALSE
and a warning is issued
if NA
values are detected.
Cluster-related functions
communities()
,
mmm_stats()
,
rename_groups()
# Manually specified groups
group <- c(rep("High", 1000), rep("Low", 1000))
model <- group_model(group_regulation, group = group)
# Groups defined by a mixed Markov model
model <- group_model(engagement_mmm)
model <- group_tna(group_regulation, group = gl(2, 1000))
model <- group_ftna(group_regulation, group = gl(2, 1000))
model <- group_ctna(group_regulation, group = gl(2, 1000))
model <- group_atna(group_regulation, group = gl(2, 1000))
Run the code above in your browser using DataLab