Learn R Programming

reGenotyper (version 1.2.0)

reGenotyper: Detecting mislabeled samples, recovering the optimal genotypes for genetical genomics experiments

Description

Main function to detect mislabeled samples using perturbation strategy

Usage

reGenotyper(phenotype, genotype, fileName = "test", thres = 0.9, optGT = TRUE, optGTplot = FALSE,optGT.thres = 0, permu = FALSE, n.permu = 10, wls.score.permu = NULL, process = TRUE, t.thres = 1.5, GT.ref=NULL)

Arguments

phenotype
phenotype data: a nTrait-by-nSample matrix
genotype
genotype data: a nMarker-by-nSample matrix with two allels being 0 and 1 (or A and B) or three allels being 0, 0.5 and 1 (or, A, H, and B), where 0.5 (or H) represents heterozygous allele.
fileName
output file name. If NULL (default) it produces files starting with "test"
thres
probability threshold to decide if a sample is mislabled based on permutation result (Default=0.9).
optGT
recovered optimal genotype from the given phenotype
optGTplot
If TRUE it produces a plot of the genotype with two colors: green and red color indicate the original genotype of a sample (column) at certain marker (row) is correct or correct, respectively.
optGT.thres
threshold to decide if thr original genotype is correct
permu
If TRUE permutation is performed to estimate the likelihood of each sample being mislabled.
n.permu
The number of permutation to be performed. n.permu=1000 is usually recommended for a reliable estimate but it can take long time.
wls.score.permu
A vector with element being WLS score from permutation which can be obtained using function permutation: e.g. wls.score.permu <- permutation(phenotype,genotype,n.permu=1000,process=TRUE,fileName="test",t.thres=3)
process
If TRUE, it prints which step has been finished. Default = TRUE.
t.thres
threshold for deciding significant QTLs (t.test) that will be used to detecting mislabled samples
GT.ref
reference gentoype data from a large collection of strains. This is used to search for best mached gentoype for identified mislabeled samples. Default= NULL. If GT.ref is NULL, the orginal input genotype data willl beused to seach for best matched genotype for identified mislabeled samples.

Value

An object of class wls. A list with elements:
wls.score
a vector with length being the number of samples; each element gives the score for the sample being mislabeled
wls.names
the names of sample that being detected as mislabeled using the Z score method
gt.opt
recovered the optimal genotype based on the given phenotype data
wls.pValue
p value for each sample using permutation, only when permu=TRUE
wls.score.permu
a vector with the length of n.permu. Each element represents the score of a randomly selected sample with permutated genotype, only when permu=TRUE.
thres
threshold used probability threshold to decide if a sample is mislabled based on permutation result

References

Li Y. et al, reGenotyper: detecting mislabeled samples in genetic data (submitted)

See Also

optimalGT, permutation, tMatFunction,genotype, phenotype

Examples

Run this code
  library(reGenotyper)
  #load example genotype and phenotype data
  data(genotype)
  data(phenotype)
  ### For this test dataset 5 permutations is enough. In real case at least few hundreds 
  ### of permutations are needed.
  wlsObject <- reGenotyper(phenotype, genotype, fileName = "test", thres = 0.9, optGT = TRUE, 
  optGTplot = FALSE,   optGT.thres = 0,  permu = TRUE, n.permu = 5, wls.score.permu = NULL, 
  process = TRUE, t.thres = 1.5, GT.ref=NULL)
  ###Inspecting the output
  wlsObject
  plot(wlsObject)
  ### previous line takes around 30s to execute, you can also load the result:
  data(wlsObject)

Run the code above in your browser using DataLab