bdiscrim: discrimination using a Bayesian linear model

Description

This function provides a discrimination method equivalent to the bclust clustering function.

Usage

bdiscrim(training, training.id, 
training.labels = NULL, predict, 
predict.label = rownames(predict)[1], 
effect.family = "gaussian", var.select = TRUE, 
transformed.par, priorprob = rep(1, max(training.id) + 1))

Arguments

training

A numeric matrix of the training set with observations in rows and variables in columns.

training.id

A vector consisting of positive integer elements having the same length as the number of rows of training. This vector identifies observations of a class such that the total number of classes is max(training.id).

training.labels

A vector of strings referring to the labels of each class. The length of the vector should match to max(training.id). The first element corresponds to the label of the type having the smallest integer value in training.id, the second element refers to the label of the type having the second smallest integer in training.id, and so on.

predict

A single discriminating type to be classified to one of the classes, with replications of the discriminating type in rows and variables in columns. The number of variables should be the same as in training. If it is not specified just the variable and variable-class importances will be calculated.

predict.label

A single string, the label of the discriminating type.

effect.family

Distribution family of the disappearing random components of the model. The choices are "gaussian" or "alaplace" allowing Gaussian or asymmetric Laplace family, respectively.

var.select

A logical value, TRUE for fitting models that define a spike-and-slab distribution in variable level and allows Bayesian variable selection.

transformed.par

The transformed model parameters in a vector. The length of the vector depends on the chosen model and the availability of variable selection. The log transformation is supposed to be applied for the variance parameters, the identity for the mean, and the logit for the proportions. The function loglikelihood can be used to estimate them from the data.

priorprob

The prior probabilities for each class in a vector. The length of the vector should be the number of classes max(training.id) plus one (the prior probability that the predict set raises its own class). If nothing is specified a uniform discrete prior is considered.

Value

probs: The posterior probabilities in a matrix with one row.
var: The variable importances, each being a log Bayes factor.
varclass: The variable-class importances, each being a log Bayes factor.

Details

The function calls internal C functions depending on the chosen model. The C-stack of the system may overflow if you have a large dataset. You may need to adjust the stack before running R using your operational system command line.

We assumed a Bayesian linear model for classification being $$y_{vctr}=\mu+\eta_{vct}+\delta_v \gamma_{vc}\theta_{vc}+\varepsilon_{vctr}$$ where $y_{vctr}$ is the available data on variable $v$, cluster $c$, type $t$, and replicate $r$; $h_{vct}$ is the between-type error, $t_{vc}$ is the disappearing random component controlled by the Bernoulli variables $d_{v}$ with success probability $q$ and $g_{vc}$ with success probability $p$; and $e_{vctr}$ is the between-replicate error. The function computes the posterior probability that the predict data share the same $\theta_{vc}$. This function also consideres that the predict data may arise its own class. For more details see Vahid Partovi Nia and Anthony C. Davison (2012)

References

Vahid Partovi Nia and Anthony C. Davison (2012). High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust. Journal of Statistical Software, 47(5), 1-22. URL http://www.jstatsoft.org/v47/i05/

Examples

Run this code

data(gaelle)
gaelle.id<-rep(1:13,rep(4,13)) # all mutants have 4 replicates
gaelle.lab<-c("d172","d263","isa2",
"sex4","dpe2","mex1","sex3","pgm","sex1","WsWT",
"tpt","RLDWT","ke103")

gaelle.bdiscrim<-bdiscrim(gaelle[-(1:3),],
training.id=gaelle.id,training.labels=gaelle.lab,
transformed.par=c(-1.84,-0.99,1.63,0.08,-0.16,-1.68),
predict=gaelle[1:3,], predict.label="ColWT")
# classify ColWT to one of the types

par(mfrow=c(1,1)) #retreive graphic defaults 
par(mar = c(5, 4, 4, 2) + 0.1) # leave some space for labels

ColWT.prob<-as.vector(gaelle.bdiscrim$probs)

# plots discrimination probabilities
bp <- barplot(ColWT.prob,ylim=c(0,1)) #plot bars
title("ColWT Discrimination")
text(bp, par("usr")[3]-0.05, srt = 90,adj=1,
labels = colnames(gaelle.bdiscrim$probs), 
xpd = TRUE) 
#plot variable labels

mtext(1, text = "Mutant", line = 4,cex=1.5)
# add x axis label
mtext(2, text = "Probability", line = 3,cex=1.2)
# add y axis labels
abline(h=1/length(ColWT.prob)) 
# draw plot a as a reference line prior probabilities line



# plots sorted discrimination probabilities
par(mfrow=c(1,1)) #retreive graphic defaults 
par(mar = c(5, 4, 4, 2) + 0.1) # leave some space for labels
bp <- barplot(sort(ColWT.prob,decreasing=TRUE),ylim=c(0,1),
col=heat.colors(length(ColWT.prob))) #plot bars
text(bp, par("usr")[3]-0.05, srt = 90,adj=1,
labels = colnames(gaelle.bdiscrim$probs)
[order(ColWT.prob,decreasing=TRUE)], 
xpd = TRUE) #plot variable labels
mtext(1, text = "Mutant", line = 4,cex=1.5)# add x axis label
mtext(2, text = "Probability", line = 3,cex=1.2)# add y axis labels
abline(h=1/length(ColWT.prob))


varclassimp<-gaelle.bdiscrim$varclass
#use thresholds to define blob colors

     blob<-matrix(0,nrow(varclassimp),ncol(varclassimp))
     blob[varclassimp<=0]<-0 
     blob[varclassimp>0]<-1
     blob[varclassimp>1]<-2
     blob[varclassimp>3]<-3
     blob[varclassimp>5]<-4
#log bayes factor thresholding 

varimp<-gaelle.bdiscrim$var
varcol<-rep(0,ncol(gaelle))
varcol[varimp>0]<-1


var.order<-order(gaelle.bdiscrim$var,decreasing=TRUE)
profileplot(x=gaelle[-(1:3),var.order],rep.id=gaelle.id,
labels=gaelle.lab,scale=10,blob.matrix=blob,ylab.mar=5,
xlab.mar=7)
#plot var class importance on profile plot using blobs
# and sort variables according to variable importance values



viplot(varimp=varimp, xlab=colnames(gaelle)[var.order],
xlab.mar=10,sort=TRUE,col=varcol[var.order])
#plot sorted variable importances

Run the code above in your browser using DataLab