FindMaxHomoOptimalPartitions is a main function for inferring optimal homogeneous clusters from a multiresolution dataset DataT.
FindMaxHomoOptimalPartitions(
DataT,
gamma = 0.05,
insigThs = 1e-08,
alpha = 0.05,
minInvs = 99,
polyDegree = 1,
expFlag = FALSE,
messageFlag = FALSE
)This function returns Copt, models, nNodes, invOptCls, and minR2cv.
Copt[p,1] is equal to k implies a cluster that is a pth member of the maximal homogeneous partition is at kth layer and the cluster name in kth layer is Copt[p,2] Copt[p,3] is "Model Information Reduction Ratio" I({C},H0,Hlin) of pth member of the maximal homogeneous partition: positive means the linear model is better than the null model.
Lastly,Copt[p,4] is the squared correlation between predicted and real Y in CV step ( eta(C)cv ) of pth member of the maximal homogeneous partition. The greater Copt[p,4], the higher homogeneous degree of this cluster.
models[[k]][[j]]$clustInfoRecRatio is the "Cluster Information Reduction Ratio" I(Cj,Cjchildren,H) between the jth cluster in kth layer
and its children clusters in (k+1)th layer: positive means current cluster is better than its children clusters. Hence, we should keep this cluster at the member of maximal homogeneous partition instead of its children.
models[[j]][[k]] is a linear model of a cluster ID k at the layer j.
The models[[j]][[k]]$selFeatureSet represents a set of selected-feature indices of the model where the feature index 1 is the intercept,
and the feature index d is the (d-1)th variable DataT$X[,d-1].
invOptCls[i,1] is the layer of optimal cluster of individual i. The optimal cluster of i is invOptCls[i,2].
is the value of eta(C)cv from the cluster that has the lowest eta(C)cv.
is an updated DataT with the helper variables for plotting and printing results.
contains a multiresolution dataset s.t.
DataT$X[i,d] is a value of feature d of individual i,
DataT$Y[i] is value of target variable of individual i that we want to fit DataT$Y ~ DataT$X in linear model, and
clsLayer[i,j] is a cluster ID of individual i at layer j; clsLayer[i,1] is the first layer that everyone typically belongs to a single cluster.
is a threshold to ...
is a threshold to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.
is a significance level to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.
is a minimum number of individuals for a cluster to be considered for inferring eta(C)cv, otherwise, eta(C)cv=0.
is a degree of polynomial function that is used to fit the data.
If it is greater than 1, the polynomial formula is used in lm() instead of "y=.".
is an exponential flag to control the formula for data fitting.
If it is true, then the exp() formula is used in lm() instead of "y=.".
is a flag. If it is true, the function shows the text regarding the progress of computing.
# Running FindMaxHomoOptimalPartitions using simulation data
DataT<-SimpleSimulation(100,type=1)
obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05)
Run the code above in your browser using DataLab