Learn R Programming

A package for clustering longitudinal trajectories

Clusters trajectories (unequally spaced and unequal length time series) on a common time axis. Clustering proceeds by an EM algorithm that iterates switching between fitting a thin plate spline (TPS) to combined responses within each cluster (M-step) and reassigning cluster membership based on the nearest fitted TPS (E-step). Initial cluster assignments are random or distant trajectories. The fitting is done with the mgcv package function bam, which scales well to very large data sets. Additional parallelism available via multicore on unix and mac platforms.

See the vignettes for detailed use examples.

Copy Link

Version

Install

install.packages('clustra')

Monthly Downloads

302

Version

0.2.1

License

BSD 2-clause License + file LICENSE

Maintainer

George Ostrouchov

Last Published

January 10th, 2024

Functions in clustra (0.2.1)

clustra-package

clustra-package
clustra

Cluster longitudinal trajectories over time
clustra_rand

clustra_rand: Rand Index cluster evaluation
plot_smooths

plot_smooths
ic_fun

Function to test information criteria. Not exported and used by internal function kchoose.
gen_traj_data

Data Generators
gendata

gendata
kchoose

A test function to evaluate information criteria for several k values. Not exported and only for debugging internal use.
deltime

Timing function
plot_silhouette

Plots a list item, a silhouette, from the result of clustra_sil along with the average silhouette value. Typically used via lapply(list, plot_silhouette)
plot_sample

Plots a sample of ids in a small mutiples layout
bp10k

Simulated blood pressure data
clustra_sil

clustra_sil: Prepare silhouette plot data for several k or for a previous clustra run
allpair_RandIndex

allpair_RandIndex: helper for replicated cluster comparison
oneid

Generates data for one id
mse_g

Various Loss functions used internally by clustra
rand_plot

Matrix plot of Rand Index comparison of replicated clusters
pred_g

Function to predict for new data based on fitted gam object.
start_groups

Function to assign starting groups.
tps_g

traj_rep

Function to run trajectories inside mclapply with one core.
trajectories

Cluster longitudinal trajectories over time.
xit_report

xit_report
check_df

Checks if non-empty groups have enough data for spline fit degrees of freedom.