Learn R Programming

HIBAG (version 1.8.3)

hlaAttrBagging: Build a HIBAG model

Description

To build a HIBAG model for predicting HLA types.

Usage

hlaAttrBagging(hla, snp, nclassifier=100, mtry=c("sqrt", "all", "one"), prune=TRUE, rm.na=TRUE, verbose=TRUE, verbose.detail=FALSE)

Arguments

hla
the training HLA types, an object of hlaAlleleClass
snp
the training SNP genotypes, an object of hlaSNPGenoClass
nclassifier
the total number of individual classifiers
mtry
a character or a numeric value, the number of variables randomly sampled as candidates for each selection. See details
prune
if TRUE, to perform a parsimonious forward variable selection, otherwise, exhaustive forward variable selection. See details
rm.na
if TRUE, remove the samples with missing HLA types
verbose
if TRUE, show information
verbose.detail
if TRUE, show more information

Value

Return an object of hlaAttrBagClass:
n.samp
the total number of training samples
n.snp
the total number of candidate SNP predictors
sample.id
the sample IDs
snp.id
the SNP IDs
snp.position
SNP position in basepair
snp.allele
a vector of characters with the format of ``A allele/B allele''
snp.allele.freq
the allele frequencies
hla.locus
the name of HLA locus
hla.allele
the HLA alleles used in the model
hla.freq
the HLA allele frequencies
assembly
the human genome reference, such like "hg19"
model
internal use

Details

mtry (the number of variables randomly sampled as candidates for each selection): "sqrt", using the square root of the total number of candidate SNPs; "all", using all candidate SNPs; "one", using one SNP; an integer, specifying the number of candidate SNPs; 0 < r < 1, the number of candidate SNPs is "r * the total number of SNPs".

prune: there is no significant difference on accuracy between parsimonious and exhaustive forward variable selections. If prune=TRUE, the searching algorithm performs a parsimonious forward variable selection: if a new SNP predictor reduces the current out-of-bag accuracy, then it is removed from the candidate SNP set for future searching. Parsimonious selection helps to improve the computational efficiency by reducing the searching times on non-informative SNP markers.

A parallel version of hlaAttrBagging is hlaParallelAttrBagging.

References

Zheng X, Shen J, Cox C, Wakefield J, Ehm M, Nelson M, Weir BS; HIBAG -- HLA Genotype Imputation with Attribute Bagging. Pharmacogenomics Journal. doi: 10.1038/tpj.2013.18. http://www.nature.com/tpj/journal/v14/n2/full/tpj201318a.html

See Also

hlaClose, hlaParallelAttrBagging, summary.hlaAttrBagClass, predict.hlaAttrBagClass

Examples

Run this code
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
# please use "nclassifier=100" when you use HIBAG for real data
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- predict(model, test.geno)
summary(pred)

# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))


# save the parameter file
mobj <- hlaModelToObj(model)
save(mobj, file="HIBAG_model.RData")
save(test.geno, file="testgeno.RData")
save(hlatab, file="HLASplit.RData")

# Clear Workspace
hlaClose(model)  # release all resources of model
rm(list = ls())


######################################################################

# NOW, load a HIBAG model from the parameter file
mobj <- get(load("HIBAG_model.RData"))
model <- hlaModelFromObj(mobj)

# validation
test.geno <- get(load("testgeno.RData"))
hlatab <- get(load("HLASplit.RData"))

pred <- predict(model, test.geno, type="response")
summary(pred)

# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))


# delete the temporary files
unlink(c("HIBAG_model.RData", "testgeno.RData", "HLASplit.RData"), force=TRUE)

Run the code above in your browser using DataLab