seq2feature_mds
extracts K
features from response processes by
multidimensional scaling.
seq2feature_mds(seqs = NULL, K = 2, method = "auto",
dist_type = "oss_action", pca = TRUE, subset_size = 100,
subset_method = "random", n_cand = 10, return_dist = FALSE,
seed = 12345, L_set = 1:3)
a "proc"
object or a square matrix. If a squared matrix is
provided, it is treated as the dissimilary matrix of a group of response processes.
the number of features to be extracted.
a character string specifies the algorithm used for performing MDS. See 'Details'.
a character string specifies the dissimilarity measure for two response processes. See 'Details'.
logical. If TRUE
(default), the principal components of the
extracted features are returned.
two parameters used in the large data algorithm. See 'Details'
and seq2feature_mds_large
.
a character string specifying the method for choosing the subset
in the large data algorithm. See 'Details' and seq2feature_mds_large
.
logical. If TRUE
, the dissimilarity matrix will be
returned. Default is FALSE
.
random seed.
length of ngrams considered
seq2feature_mds
returns a list containing
a numeric matrix giving the K
extracted features or principal
features. Each column is a feature.
the dissimilary matrix. This element exists only if
return_dist=TRUE
.
Since the classical MDS has a computational complexity of order \(n^3\) where
\(n\) is the number of response processes, it is computational expensive to
perform classical MDS when a large number of response processes is considered.
In addition, storing an \(n \times n\) dissimilarity matrix when \(n\) is large
require a large amount of memory. In seq2feature_mds
, the algorithm proposed
in Paradis (2018) is implemented to obtain MDS for large datasets. method
specifies the algorithm to be used for obtaining MDS features. If method = "small"
,
classical MDS is used by calling cmdscale
. If method = "large"
,
the algorithm for large datasets will be used. If method = "auto"
(default),
seq2feature_mds
selects the algorithm automatically based on the sample size.
dist_type
specifies the dissimilarity to be used for measuring the discrepancy
between two response processes. If dist_type = "oss_action"
, the order-based
sequence similarity (oss) proposed in Gomez-Alonso and Valls (2008) is used
for action sequences. If dist_type = "oss_both"
, both action sequences and
timestamp sequences are used to compute a time-weighted oss.
The number of
features to be extracted K
can be selected by cross-validation using
chooseK_mds
.
Gomez-Alonso, C. and Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.) Modeling Decisions for Artificial Intelligence, (pp. 134-145). Springer Berlin Heidelberg.
Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935-939.
chooseK_mds
for choosing K
.
Other feature extraction methods: aseq2feature_seq2seq
,
atseq2feature_seq2seq
,
seq2feature_mds_large
,
seq2feature_seq2seq
,
tseq2feature_seq2seq
# NOT RUN {
n <- 50
seqs <- seq_gen(n)
theta <- seq2feature_mds(seqs, 5)$theta
# }
Run the code above in your browser using DataLab