Performs k-means clustering on continuous response measured over time,
where each mean is defined by a thin plate spline fit to all points in a
cluster. Typically, this function is called by clustra.
trajectories(
data,
k,
group,
maxdf,
conv = c(10, 0),
mccores = 1,
verbose = FALSE,
...
)A list with components
deviance - The final deviance in each cluster added across clusters.
group - Integer vector of group assignments corresponding to unique ids.
loss - Numeric matrix with rows corresponding to unique ids and one
column for each cluster. Each entry is the mean squared loss for the data in
the id relative to the cluster model.
k - An integer giving the requested number of clusters.
k_cl - An integer giving the converged number of clusters. Can be
smaller than k when some clusters become too small for degrees of freedom
during convergence.
data_group - An integer vector, giving group assignment as expanded into
all id time points.
tps - A list with k_cl elements, each an object returned by the
mgcv::bam fit of a cluster thin plate spline model.
iterations - An integer giving the number of iterations taken.
counts - An integer vector giving the number of ids in each cluster.
counts_df - An integer vector giving the total number of observations in
each cluster (sum of the number of observations for ids belonging to the
cluster).
changes - An integer, giving the number of ids that changed clusters in
the last iteration. This is zero if converged.
Data table or data frame with response measurements, one per observation.
Column names are id, time, response, group. Note that
ids must be sequential starting from 1. This affects expanding group
numbers to ids.
Number of clusters (groups)
Vector of initial group numbers corresponding to ids.
Integer. Basis dimension of smooth term. See s function
parameter k, in package mgcv.
A vector of length two, c(iter, minchange), where iter is the maximum
number of EM iterations and minchange is the minimum percentage of subjects
changing group to continue iterations. Setting minchange to zero continues
iterations until no more changes occur or maxiter is reached.
Integer number of cores to use by mclapply sections. Parallelization is
over k, the number of clusters.
Logical, whether to produce debug output. A value > 1 will plot tps fit lines in each iteration.
See clustra for allowed ... parameters.
George Ostrouchov and David Gagnon