Make Variable Clustering Quarto Report Section
vClus(
d,
exclude = NULL,
corrmatrix = FALSE,
redundancy = FALSE,
spc = FALSE,
trans = FALSE,
rexclude = NULL,
fracmiss = 0.2,
maxlevels = 10,
minprev = 0.05,
imputed = NULL,
horiz = FALSE,
label = "fig-varclus",
print = TRUE,
redunargs = NULL,
spcargs = NULL,
transaceargs = NULL,
transacefile = NULL,
spcfile = NULL
)makes Quarto tabs and prints output, returning nothing unless spc=TRUE or trans=TRUE are used, in which case a list with components princmp and/or transace is returned and these components can be passed to special print and plot methods for spc or to ggplot_transace. The user can put scree plots and PC loading plots in separate code chunks that use different figure sizes that way.
a data frame or table
formula or vector of character strings containing variables to exclude from analysis
set to TRUE to use Hmisc::plotCorrM() to depict a Spearman rank correlation matrix.
set to TRUE to run Hmisc::redun() on non-excluded variables
set to TRUE to run Hmisc::princmp() to do a sparse principal component analysis with the argument method='sparse' passed
set to TRUE to run Hmisc::transace() to transform each predictor before running redundancy or principal components analysis. transace is run on the stacked filled-in data if imputed is given.
extra variables to exclude from transace transformating-finding, redundancy analysis, and sparce principal components (formula or character vector)
if the fraction of NAs for a variable exceeds this the variable will not be included
if the maximum number of distinct values for a categorical variable exceeds this, the variable will be dropped
the minimum proportion of non-missing observations in a category for a binary variable to be retained, and the minimum relative frequency of a category before it will be combined with other small categories
an object created by Hmisc::aregImpute() or mice::mice() that contains information from multiple imputation that causes vClus to create all the filled-in datasets, stack them into one tall dataset, and pass that dataset to Hmisc::redun() or Hmisc::princmp() so that NAs can be handled efficiently in redundancy analysis and sparse principal components, i.e., without excluding partial records. Variable clustering and the correlation matrix are already efficient because they use pairwise deletion of NAs.
set to TRUE to draw the dendrogram horizontally
figure label for Quarto
set to FALSE to not let dataframeReduce report details
a list() of other arguments passed to Hmisc::redun()
a list() of other arguments passed to Hmisc::princmp()
a list() of other arguments passed to Hmisc::transace()
similar to spcfile and can be used when trans=TRUE
a character string specifying an .rds R binary file to hold the results of sparse principal component analysis. Using Hmisc::runifChanged(), if the file name is specified and no inputs have changed since the last run, the result is read from the file. Otherwise a new run is made and the file is recreated if spcfile is specified. This is done because sparse principal components can take several minutes to run on large files.
Frank Harrell
Draws a variable clustering dendrogram and optionally graphically depicts a correlation matrix. See this for an example. Uses Hmisc::varclus().
Hmisc::varclus(), Hmisc::plotCorrM(), Hmisc::dataframeReduce(), Hmisc::redun(), Hmisc::princmp(), Hmisc::transace()
if (FALSE) {
vClus(mydata, exclude=.q(country, city))
}
Run the code above in your browser using DataLab