Learn R Programming

HIBAG (version 1.8.3)

hlaCompareAllele: Evaluate prediction accuracies

Description

To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.

Usage

hlaCompareAllele(TrueHLA, PredHLA, allele.limit=NULL, call.threshold=NaN, max.resolution="", output.individual=FALSE, verbose=TRUE)

Arguments

TrueHLA
an object of hlaAlleleClass, the true HLA types
PredHLA
an object of hlaAlleleClass, the predicted HLA types
allele.limit
a list of HLA alleles, the validation samples are limited to those having HLA alleles in allele.limit, or NULL for no limit. allele.limit could be character-type, hlaAttrBagClass or hlaAttrBagObj
call.threshold
the call threshold for posterior probability, i.e., call or no call is determined by whether prob >= call.threshold or not
max.resolution
"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution
output.individual
if TRUE, output accuracy for each individual
verbose
if TRUE, show information

Value

Return a list(overall, confusion, detail), or list(overall, confusion, detail, individual) if output.individual=TRUE.overall (data.frame):
total.num.ind
the total number of individuals
crt.num.ind
the number of individuals with correct HLA types
crt.num.haplo
the number of chromosomes with correct HLA alleles
acc.ind
the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.)
acc.haplo
the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.)
call.threshold
call threshold, if it is NaN, no call threshold is executed
n.call
the number of individuals with call
call.rate
overall call rate
confusion (matrix): a confusion matrix.detail (data.frame):
allele
HLA alleles
train.num
the number of training haplotypes
train.freq
the training haplotype frequencies
valid.num
the number of validation haplotypes
valid.freq
the validation haplotype frequencies
call.rate
the call rates for HLA alleles
accuracy
allele accuracy
sensitivity
sensitivity
specificity
specificity
ppv
positive predictive value
npv
negative predictive value
miscall
the most likely miss-called alleles
miscall.prop
the proportions of the most likely miss-called allele in all miss-called alleles
individual (data.frame):
sample.id
sample id
true.hla
the true HLA type
pred.hla
the prediction of HLA type
accuracy
accuracy, 0, 0.5, or 1

See Also

hlaAttrBagging, predict.hlaAttrBagClass, hlaReport

Examples

Run this code
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- predict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))

Run the code above in your browser using DataLab