wideRhino
The goal of wideRhino is to enable the construction of canonical
variate analysis (CVA) biplots for high-dimensional data settings,
specifically where the number of variables ($p$) exceeds the number of
observations ($n$). The package addresses the singularity limitation of
the within-group scatter matrix by leveraging the generalised singular
value decomposition (GSVD).
Installation
You can install the development version of wideRhino from GitHub with:
# install.packages("pak")
pak::pak("RaeesaGaney91/wideRhino")Example
When $p < n$, then the CVA-GSVD biplot will result to the standard CVA
biplot. Here is an example using the penguins data:
library(wideRhino)
Penguins <- datasets::penguins[stats::complete.cases(penguins),]
CVAgsvd(X=Penguins[,3:6],group = Penguins[,1]) |>
CVAbiplot(group.col=c("blue","purple","forestgreen"))When $p > n$, then the standard CVA biplot will not work due to the singularity of the within-scatter matrix, and this is when the GSVD becomes useful. Using a simulated data set with 3 groups, 100 observations and 300 variables, a CVA-GSVD biplot can be constructed:
data(sim_data)
CVAgsvd(X=sim_data[,2:301],group = sim_data[,1]) |>
CVAbiplot(group.col=c("tan1","darkcyan","darkslateblue"),which.var = 1:10,zoom.out=80)