LDA.boost: Implementation of the linear discriminant function for multi-label classification.

Description

This function applies the linear discriminant function to do classification for multi-label responses. The precision matrix, or the inverse of the covariance matrix, in the linear discriminant function can be estimated by w in the function boost.graph. In addition, error-prone covariates in the linear discriminant function are addressed by the regression calibration.

Usage

LDA.boost(data, resp, theta, sigma_e = 0.6,q = 0.8,lambda = 1, pi = 0.5)

Value

score: The value of the linear discriminant function (see details) with the estimator of the precision matrix accommodated.
class: The result of predicted class for subjects.

Arguments

data: An n (observations) times p (variables) matrix of random variables, whose distributions can be continuous, discrete, or mixed.
resp: An n-dimensional vector of categorical random variables, which is the response in the data.
theta: The estimator of the precision matrix.
sigma_e: The common value in the diagonal covariance matrix of the error for the classical measurement error model when data are continuous. The default value is 0.6.
q: The common value used to characterize misclassification for binary random variables. The default value is 0.8.
lambda: The parameter of the Poisson distribution, which is used to characterize error-prone count random variables. The default value is 1.
pi: The probability in the Binomial distribution, which is used to characterize error-prone count random variables. The default value is 0.5.

Author

Hui-Shan Tsao and Li-Pang Chen
Maintainer: Hui-Shan Tsao n410412@gmail.com

Details

The linear discriminant function used is as follow:
$$ \code{score}_{i,j} = \log (\pi _i) - 0.5\ \mu_{i}^\top\ \code{theta}\ \mu _{i} + \code{data}_{j}^\top\ \code{theta}\ \mu_{i}, $$
for the class $i = 1, \cdots, I$ with $I$ being the number of classes in the dataset and subject $j = 1, \cdots, n$, where $\pi _i$ is the proportion of subjects in the class $i$, $\code{data}_{j}$ is the vector of covariates for the subject $j$, $\code{theta}$ is the precision matrix of the covariates, and $\mu_{i}$ is the empirical mean vector of the random variables in the class $i$.

References

Hui-Shan Tsao (2024). Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Dsicriminant Analysis. Master Thesis supervised by Li-Pang Chen, National Chengchi University.

Examples

Run this code

data(MedulloblastomaData)

X <- t(MedulloblastomaData[2:655,]) #covariates
Y <- MedulloblastomaData[1,] #response

X <- matrix(as.numeric(X),nrow=23)

p <- ncol(X)
n <- nrow(X)

#standarization
X_new=data.frame()
for (i in 1:p){
 X_new[1:n,i]=(X[,i]-rep(mean(X[,i]),n))/sd(X[,i])
}
X_new=matrix(unlist(X_new),nrow = n)

# \donttest{
#estimate graphical model
result <- boost.graph(data = X_new, thre = 0.2, ite1 = 3, ite2 = 0, ite3 = 0, rep = 1)
theta.hat <- result$w

theta.hat[which(theta.hat<0.8)]=0 #keep the highly dependent pairs

#predict
pre <- LDA.boost(data = X_new, resp = Y, theta = theta.hat)
estimated_Y <- pre$class# }

Run the code above in your browser using DataLab