KODAMA(data,M=100,Tcycle=20, FUN_VAR=function(x){ceiling(ncol(x))}, FUN_SAM=function(x){ceiling(nrow(x)*0.75)}, bagging=FALSE, FUN=KNN.CV, f.par=list(kn=10), W=NULL, constrain=NULL, fix=rep(FALSE,nrow(data)), epsilon=0.05, shake=FALSE)
bagging = TRUE
. By default bagging = FALSE
KNN.CV
", "PLS.SVM.CV
" , and "PCA.CA.KNN.CV
".
nrow(data)
elements. The KODAMA procedure can be started by different initializations of the vector W
. Without any a priori information the vector W
can be initializated with each element being different from the others (i.e., each sample categorized in a one-element class). Alternatively, the vector W
can be initialized by a clustering procedure, such as kmeans
.
nrow(data)
elements. Supervised constraints can be imposed by linking some samples in such a way that if one of them is changed the linked ones must change in the same way (i.e., they are forced to belong to the same class) during the maximization of the cross-validation accuracy procedure. Sample with the same identificative constrain will be forced to stay together.
nrow(data)
elements. The values of this vector must to be TRUE
or FALSE
. By default all elements are FALSE
. Samples with the TRUE
fix value will not change the class label defined in W
during the maximization of the cross-validation accuracy procedure.
epsilon = 0.05
.
shake = FALSE
the cross-validated accuracy is computed with the class defined in W
else the it is not, before the maximization of the cross-validation accuracy procedure.
M
cross-validated accuracies.FUN_SAM
) are randomly selected from the original data. The whole iterative process (step I-III) is repeated M
times to average the effects owing to the randomness of the iterative procedure. Each time that this part is repeated, a different fraction of sample is selected. The second part aims at collectioning and processing these results by costructing a dissimilarity matrix to provide a holistic view of the data while maintaining their intrinsic structure (steps IV and V).
cmdscale
# data(iris)
# kk=KODAMA(iris[,-5])
# pp = cmdscale(kk$dissimilarity)
# plot(pp,col=rep(2:4,each=50))
#
#
#
# WARNING: The next example is high computational extensive
#
# data(MetRef);
# u=MetRef$data;
# u=u[,-which(colSums(u)==0)]
# u=scaling(u)$newXtrain
# class=as.factor(unlist(MetRef$donor))
# kk=KODAMA(u,FUN=PCA.CA.KNN.CV, W=function(x) as.numeric(kmeans(x,50)$cluster))
# pp = cmdscale(kk$dissimilarity)
# plot(pp,col=class)
# pp = cmdscale(kk$dissimilarity)
# plot(pp,col=class)
Run the code above in your browser using DataLab