Learn R Programming

sparsediscrim

The R package sparsediscrim provides a collection of sparse and regularized discriminant analysis classifiers that are especially useful for when applied to small-sample, high-dimensional data sets.

The package was archived in 2018 and was re-released in 2021. The package code was forked from John Ramey’s repo and subsequently modified.

Installation

You can install the stable version on CRAN:

install.packages('sparsediscrim', dependencies = TRUE)

If you prefer to download the latest version, instead type:

library(devtools)
install_github('topepo/sparsediscrim')

Usage

The formula and non-formula interfaces can be used:

library(sparsediscrim)

data(parabolic, package = "modeldata")

qda_mod <- qda_shrink_mean(class ~ ., data = parabolic)
# or
qda_mod <- qda_shrink_mean(x = parabolic[, 1:2], y = parabolic$class)

qda_mod
#> Shrinkage-Mean-Based Diagonal QDA
#> 
#> Sample Size: 500 
#> Number of Features: 2 
#> 
#> Classes and Prior Probabilities:
#>   Class1 (48.8%), Class2 (51.2%)

# Prediction uses the `type` argument: 

parabolic_grid <-
   expand.grid(X1 = seq(-5, 5, length = 100),
               X2 = seq(-5, 5, length = 100))


parabolic_grid$qda <- predict(qda_mod, parabolic_grid, type = "prob")$Class1

library(ggplot2)
ggplot(parabolic, aes(x = X1, y = X2)) +
   geom_point(aes(col = class), alpha = .5) +
   geom_contour(data = parabolic_grid, aes(z = qda), col = "black", breaks = .5) +
   theme_bw() +
   theme(legend.position = "top") +
   coord_equal()

Classifiers

The sparsediscrim package features the following classifier (the R function is included within parentheses):

The sparsediscrim package also includes a variety of additional classifiers intended for small-sample, high-dimensional data sets. These include:

ClassifierAuthorR Function
Diagonal Linear Discriminant AnalysisDudoit et al. (2002)lda_diag()
Diagonal Quadratic Discriminant AnalysisDudoit et al. (2002)qda_diag()
Shrinkage-based Diagonal Linear Discriminant AnalysisPang et al. (2009)lda_shrink_cov()
Shrinkage-based Diagonal Quadratic Discriminant AnalysisPang et al. (2009)qda_shrink_cov()
Shrinkage-mean-based Diagonal Linear Discriminant AnalysisTong et al. (2012)lda_shrink_mean()
Shrinkage-mean-based Diagonal Quadratic Discriminant AnalysisTong et al. (2012)qda_shrink_mean()
Minimum Distance Empirical Bayesian Estimator (MDEB)Srivistava and Kubokawa (2007)lda_emp_bayes()
Minimum Distance Rule using Modified Empirical Bayes (MDMEB)Srivistava and Kubokawa (2007)lda_emp_bayes_eigen()
Minimum Distance Rule using Moore-Penrose Inverse (MDMP)Srivistava and Kubokawa (2007)lda_eigen()

We also include modifications to Linear Discriminant Analysis (LDA) with regularized covariance-matrix estimators:

Copy Link

Version

Install

install.packages('sparsediscrim')

Monthly Downloads

994

Version

0.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Max Kuhn

Last Published

July 1st, 2021

Functions in sparsediscrim (0.3.0)

cov_eigen

Computes the eigenvalue decomposition of the maximum likelihood estimators (MLE) of the covariance matrices for the given data matrix
cov_shrink_diag

Computes a shrunken version of the maximum likelihood estimator for the sample covariance matrix under the assumption of multivariate normality.
center_data

Centers the observations in a matrix by their respective class sample means
lda_emp_bayes

The Minimum Distance Empirical Bayesian Estimator (MDEB) classifier
lda_eigen

The Minimum Distance Rule using Moore-Penrose Inverse (MDMP) classifier
cov_intraclass

Generates a \(p \times p\) intraclass covariance matrix
cov_list

Computes the covariance-matrix maximum likelihood estimators for each class and returns a list.
cv_partition

Randomly partitions data for cross-validation.
generate_blockdiag

Generates data from K multivariate normal data populations, where each population (class) has a covariance matrix consisting of block-diagonal autocorrelation matrices.
lda_schafer

Linear Discriminant Analysis using the Schafer-Strimmer Covariance Matrix Estimator
rda_high_dim_cv

Helper function to optimize the HDRDA classifier via cross-validation
lda_shrink_cov

Shrinkage-based Diagonal Linear Discriminant Analysis (SDLDA)
generate_intraclass

Generates data from K multivariate normal data populations, where each population (class) has an intraclass covariance matrix.
log_determinant

Computes the log determinant of a matrix.
no_intercept

Removes the intercept term from a formula if it is included
cov_autocorrelation

Generates a \(p \times p\) autocorrelated covariance matrix
print.lda_eigen

Outputs the summary for a MDMP classifier object.
print.lda_diag

Outputs the summary for a DLDA classifier object.
lda_emp_bayes_eigen

The Minimum Distance Rule using Modified Empirical Bayes (MDMEB) classifier
print.lda_thomaz

Outputs the summary for a lda_thomaz classifier object.
qda_shrink_cov

Shrinkage-based Diagonal Quadratic Discriminant Analysis (SDQDA)
print.lda_emp_bayes

Outputs the summary for a MDEB classifier object.
qda_shrink_mean

Shrinkage-mean-based Diagonal Quadratic Discriminant Analysis (SmDQDA) from Tong, Chen, and Zhao (2012)
cov_mle

Computes the maximum likelihood estimator for the sample covariance matrix under the assumption of multivariate normality.
rda_weights

Computes the observation weights for each class for the HDRDA classifier
diag_estimates

Computes estimates and ancillary information for diagonal classifiers
print.qda_diag

Outputs the summary for a DQDA classifier object.
print.lda_emp_bayes_eigen

Outputs the summary for a MDMEB classifier object.
lda_pseudo

Linear Discriminant Analysis (LDA) with the Moore-Penrose Pseudo-Inverse
cov_pool

Computes the pooled maximum likelihood estimator (MLE) for the common covariance matrix
tong_mean_shrinkage

Tong et al. (2012)'s Lindley-type Shrunken Mean Estimator
solve_chol

Computes the inverse of a symmetric, positive-definite matrix using the Cholesky decomposition
h

Bias correction function from Pang et al. (2009).
plot.rda_high_dim_cv

Plots a heatmap of cross-validation error grid for a HDRDA classifier object.
lda_diag

Diagonal Linear Discriminant Analysis (DLDA)
posterior_probs

Computes posterior probabilities via Bayes Theorem under normality
print.lda_shrink_cov

Outputs the summary for a SDLDA classifier object.
rda_cov

Calculates the RDA covariance-matrix estimators for each class
dmvnorm_diag

Computes multivariate normal density with a diagonal covariance matrix
print.lda_schafer

Outputs the summary for a lda_schafer classifier object.
print.lda_pseudo

Outputs the summary for a lda_pseudo classifier object.
print.lda_shrink_mean

Outputs the summary for a SmDLDA classifier object.
rda_high_dim

High-Dimensional Regularized Discriminant Analysis (HDRDA)
quadform_inv

Quadratic Form of the inverse of a matrix and a vector
var_shrinkage

Shrinkage-based estimator of variances for each feature from Pang et al. (2009).
quadform

Quadratic form of a matrix and a vector
lda_thomaz

Linear Discriminant Analysis using the Thomaz-Kitani-Gillies Covariance Matrix Estimator
lda_shrink_mean

Shrinkage-mean-based Diagonal Linear Discriminant Analysis (SmDLDA) from Tong, Chen, and Zhao (2012)
regdiscrim_estimates

Computes estimates and ancillary information for regularized discriminant classifiers
qda_diag

Diagonal Quadratic Discriminant Analysis (DQDA)
two_class_sim_data

Example bivariate classification data from caret
risk_stein

Stein Risk function from Pang et al. (2009).
update_rda_high_dim

Helper function to update tuning parameters for the HDRDA classifier
print.rda_high_dim

Outputs the summary for a HDRDA classifier object.
print.qda_shrink_cov

Outputs the summary for a SDQDA classifier object.
print.qda_shrink_mean

Outputs the summary for a SmDQDA classifier object.
cov_block_autocorrelation

Generates a \(p \times p\) block-diagonal covariance matrix with autocorrelated blocks.