Using interactive control panel (see rpanel
) and 3D
real-time rendering system (rgl
), this package provides a
user friendly GUI for estimating the minimum number of biomarkers
(variables) needed to achieve a given level of accuracy for two-group
classification problems based on microarray data.
optimiseBiomarker (error,
errorTol = 0.05,
method = "RF", nTrain = 100,
sdB = 1.5,
sdW = 1,
foldAvg = 2.88,
nRep = 3)
The database of classification errors. See
errorDbase
for details.
Error tolerance limit.
Classification method. Can be one of "RF"
,
"SVM"
, and "KNN"
for Random Forest,
Support Vector Machines, Linear Discriminant Analysis and k-Nearest
Neighbour respectively.
Training set size, i.e., the total number of biological samples in group 1 and group 2.
Biological variation (\(\sigma_b\)) of data in log (base 2) scale.
Experimental (technical) variation (\(\sigma_e\)) of data in log (base 2) scale.
Average fold change of the biomarkers.
Number of technical replications.
The function optimiseBiomarker
is a user friendly GUI for
interrogating the database of leave-one-out cross-validation errors,
errorDbase
, to estimate optimal number of biomarkers for
microarray based classifications. The database is built on the basis of
simulated data using the classificationError
function. The
function simData
is used for simulating microarray data
for various combinations of factors such as the number of biomarkers,
training set size, biological variation, experimental variation, fold
change, replication, and correlation.
Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.
Breiman, L. (2001). Random Forests, Machine Learning 45(1), 5--32.
Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines, https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.
Efron, B. and Tibshirani, R. (1997). Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548--560.
Bowman, A., Crawford, E., Alexander, G. and Bowman, R. W. (2007). rpanel: Simple interactive controls for R functions using the tcltk package. Journal of Statistical Software 17(9).
# NOT RUN {
if(interactive()){
data(errorDbase)
optimiseBiomarker(error=errorDbase)
}
# }
Run the code above in your browser using DataLab