Learn R Programming

HiDimDA (version 0.2-0)

RFlda: High-Dimensional Factor-based Linear Discriminant Analysis.

Description

RFlda finds the coefficients of a linear discriminant rule based on a correlation (or covariance) matrix estimator that tries to approximate the true correlation (covariance) by the closest (according to a Frobenius norm) correlation (covariance) compatible with a q-factor model.

Usage

## S3 method for class 'default':
RFlda(data, grouping, q = 1, prior = "proportions", CorrAp = TRUE, 
maxq=5, VSelfunct = SelectV, ldafun=c("canonical","classification"), nstarts = 1, 
CVqtrials=1:3, CVqfolds=3, CVqrep=1, CVqStrt=TRUE, ...)

## S3 method for class 'data.frame': RFlda(data, \dots)

Arguments

data
Matrix or data frame of observations.
grouping
Factor specifying the class for each observation.
q
Number of factors assumed by the model. This argument can be set to a fixed number between 1 and the argument of maxq, or to the string CVq. In the latter case the number of factors is chosen amongst the values of the arg
prior
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.
CorrAp
A boolean flag indicating whether the approximation error of the correlation (default), or of the covariance matrix , should be minimized.
maxq
Upper limit on the values allowed for argument q.
VSelfunct
Variable selection function. Either the string none (no selection is to be performed) or a function that takes data and grouping as its first two arguments and returns a list with two components: (i)
ldafun
Type of discriminant linear functions computed. The alternatives are canonical for maximum-discrimination canonical functions and classification for direct-classification functions.
nstarts
Number of different randomly generated starting points used in the minimization of the Frobenius norm of the correlation (or covariance) matrix approximation.
CVqtrials
Vector of values to be tried for the number of factors assumed by the model, when argument q is set to CVq.
CVqfolds
Number of training sample folds to be created in each replication of the cross-validation procedure for choosing the number of factors, when argument q is set to CVq.
CVqrep
Number of replications to be performed in the cross-validation procedure for choosing the number of factors, when argument q is set to CVq.
CVqStrt
Boolean flag indicating if, in the cross-validation procedure for choosing the number of factors when argument q is set to CVq, the folds should be stratified according to the original class proportions (default), or rand
...
Further arguments passed to or from other methods.

Value

  • If algument ldafun is set to canonical an object of class RFcanlda, which extends class canldaRes, with the following components:
  • priorThe prior probabilities used.
  • meansThe class means.
  • scalingA matrix which transforms observations to discriminant functions, normalized so that the within groups covariance matrix is spherical.
  • svdThe singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics.
  • vkptA vector with the indices of the variables kept in the discriminant rule if the number of variables kept is less than ncol(data). NULL otherwise.
  • nvkptThe number of variables kept in the discriminant rule if this number is less thanncol(data). NULL otherwise.
  • qThe number of o factors used in the factor model chosen.
  • SigFqAn object of class SigFq with the q-factor model approximation to the within groups covariance matrix. SigFq objects have specialized methods for matrix inversion, multiplication, and element-wise arithmetic operations.
  • SigFqInvAn object of class SigFqInv with the q-factor model approximation to the within groups precision (inverse covariance) matrix. SigFqInv objects have specialized methods for matrix inversion, multiplication, and element-wise arithmetic operations.
  • NThe number of observations used.
  • callThe (matched) function call.
  • If algument ldafun is set to classification an object of class RFcllda, which extends class clldaRes, with the following components:
  • priorThe prior probabilities used.
  • meansThe class means.
  • coefA matrix with the coefficients of the k-1 classification functions.
  • cnstA vector with the thresholds (2nd members of linear classification rules) used in classification rules that assume equal priors.
  • vkptA vector with the indices of the variables kept in the discriminant rule if the number of variables kept is less than ncol(data). NULL otherwise.
  • nvkptThe number of variables kept in the discriminant rule if this number is less thanncol(data). NULL, otherwise.
  • qThe number of o factors used in the factor model chosen.
  • SigFqAn object of class SigFq with the q-factor model approximation to the within groups covariance matrix.
  • SigFq objects have specialized methods for matrix inversion, multiplication, and element-wise arithmetic operations.
  • SigFqInvAn object of class SigFqInv with the q-factor model approximation to the within groups precision (inverse covariance) matrix. SigFqInv objects have specialized methods for matrix inversion, multiplication, and element-wise arithmetic operations.
  • NThe number of observations used.
  • callThe (matched) function call.

References

Pedro Duarte Silva, A. (2006) Two Group Classification with High-Dimensional Correlated Data: A Factor Model Approach, Computational Statistics and Data Analysis, 55 (1), 2975-2990.

See Also

FrobSigAp, SelectV, SigFq, SigFqInv, predict.canldaRes, predict.clldaRes, AlonDS

Examples

Run this code
#train classifier with 10 genes (after a logarithmic transformation) on Alon's Colon Cancer Data set. 

log10genes <- log10(AlonDS[,-1])

ldarule1 <- RFlda(log10genes,AlonDS$grouping,Selmethod="fixedp",maxp=10)     

# get in-sample classification results

predict(ldarule1,log10genes,grpcodes=levels(AlonDS$grouping))$class           	       

# compare classifications with true assignments

cat("Original classes:
")
print(AlonDS$grouping)             		 

# Estimate error rates by four-fold cross-validation.
# (Note: In cross-validation analysis it is recommended to set the argument 
# 'ldafun' to "classification", in order to speed up computations by avoiding 
# unecessary eigen-decompositions) 

CrosValRes1 <- DACrossVal(log10genes,AlonDS$grouping,TrainAlg=RFlda,
Selmethod="fixedp",ldafun="classification",maxp=10,kfold=4,CVrep=1)
summary(CrosValRes1[,,"Clerr"])

# Find the best factor model amongst the choices q=1 or 2

ldarule2 <- RFlda(log10genes,AlonDS$grouping,q="CVq",CVqtrials=1:2,
Selmethod="fixedp",ldafun="classification",maxp=10)
cat("Best error rate estimate found with q =",ldarule2$q,"")

# Perform the analysis finding the number of selected genes by the Expanded HC scheme 

ldarule3 <- RFlda(log10genes,AlonDS$grouping,q=ldarule2$q)     
cat("Number of selected genes =",ldarule3$nvkpt,"")

# get classification results

predict(ldarule3,log10genes,grpcodes=levels(AlonDS$grouping))$class

Run the code above in your browser using DataLab