amltest
. This function can be used to remove markers with a high proportion of missing values, impute missing values with sample average, remove markers with very little variation, and if necessary, re-encode the minor allele as 1 and the majority allele as 0.
cleanclust(marker, nafrac=0.2, mafb=0.1, corbnd=0.5, method="complete")
cleanclust
change its value to 1 minus the original column. Each column has to have a unique name to identify the marker.hclust
. The values could be one of "complete", "average" or "single". The default is "complete".newmarker
.amltest
and other functions in this package.
The R code for the original Hclust package can be find at http://www.epic.Pitt.ed/Accompaniment/hclust/hclust.ht,
which provides more functionality.The function cleanclust
provides two main utilities. The first is to clean and impute the marker data, including removing markers with a high proportion of missing values or very low minor allele frequency as well as impute the remaining missing values by the sample mean regarding each marker. The second is to remove some markers when necessary so that no markers will be highly correlated. Like other LASSO type method, the performance of adaptive mixed LASSO can be improved when predictors are not highly correlated. This process follows that of Rinald et al. (2005). The correlation between each pair of markers are calculated and $r=1-cor^2$ is used as the distance between markers to perform hierarchical clustering with hclust
. The resulted dendrogram is cut to form clusters according to the bound on $cor^2$, corbnd
. Specifically, higher corbnd
values will result in less clusters being formed and less markers in the output. One marker is retained for each cluster in newmarker
.
Wang, D., Eskridge, K.M. and Crossa, J. (2011) Identifying QTLs and Epistasis in Structured Plant Populations Using Adaptive Mixed LASSO. Journal of Agricultural, Biological, and Environmental Statistics, 16:170-184.
Wang, D., et al. (2012) Prediction of genetic values of quantitative traits with epistatic effects in plant breeding populations. Heredity, 109: 313-319.
amltest
.
## process the markers in the wheat data set.
data("wheat")
clmarker<- cleanclust(wheat$marker, nafrac=0.2, mafb=0.1, corbnd=0.5, method="complete")
Run the code above in your browser using DataLab