FindMaxHomoOptimalPartitions: FindMaxHomoOptimalPartitions

Description

FindMaxHomoOptimalPartitions is a main function for inferring optimal homogeneous clusters from a multiresolution dataset DataT.

Usage

FindMaxHomoOptimalPartitions(
  DataT,
  gamma = 0.05,
  insigThs = 1e-08,
  alpha = 0.05,
  minInvs = 99,
  polyDegree = 1,
  expFlag = FALSE,
  messageFlag = FALSE
)

Value

This function returns Copt, models, nNodes, invOptCls, and minR2cv.

Copt: Copt[p,1] is equal to k implies a cluster that is a pth member of the maximal homogeneous partition is at kth layer and the cluster name in kth layer is Copt[p,2] Copt[p,3] is "Model Information Reduction Ratio" I({C},H0,Hlin) of pth member of the maximal homogeneous partition: positive means the linear model is better than the null model. Lastly,Copt[p,4] is the squared correlation between predicted and real Y in CV step ( eta(C)cv ) of pth member of the maximal homogeneous partition. The greater Copt[p,4], the higher homogeneous degree of this cluster.
clustInfoRecRatio: models[[k]][[j]]$clustInfoRecRatio is the "Cluster Information Reduction Ratio" I(Cj,Cjchildren,H) between the jth cluster in kth layer and its children clusters in (k+1)th layer: positive means current cluster is better than its children clusters. Hence, we should keep this cluster at the member of maximal homogeneous partition instead of its children.
models: models[[j]][[k]] is a linear model of a cluster ID k at the layer j. The models[[j]][[k]]$selFeatureSet represents a set of selected-feature indices of the model where the feature index 1 is the intercept, and the feature index d is the (d-1)th variable DataT$X[,d-1].
invOptCls: invOptCls[i,1] is the layer of optimal cluster of individual i. The optimal cluster of i is invOptCls[i,2].
minR2cv: is the value of eta(C)cv from the cluster that has the lowest eta(C)cv.
DataT: is an updated DataT with the helper variables for plotting and printing results.

Arguments

DataT: contains a multiresolution dataset s.t. DataT$X[i,d] is a value of feature d of individual i, DataT$Y[i] is value of target variable of individual i that we want to fit DataT$Y ~ DataT$X in linear model, and clsLayer[i,j] is a cluster ID of individual i at layer j; clsLayer[i,1] is the first layer that everyone typically belongs to a single cluster.
gamma: is a threshold to ...
insigThs: is a threshold to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.
alpha: is a significance level to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.
minInvs: is a minimum number of individuals for a cluster to be considered for inferring eta(C)cv, otherwise, eta(C)cv=0.
polyDegree: is a degree of polynomial function that is used to fit the data. If it is greater than 1, the polynomial formula is used in lm() instead of "y=.".
expFlag: is an exponential flag to control the formula for data fitting. If it is true, then the exp() formula is used in lm() instead of "y=.".
messageFlag: is a flag. If it is true, the function shows the text regarding the progress of computing.

Examples

Run this code

# Running FindMaxHomoOptimalPartitions using simulation data
DataT<-SimpleSimulation(100,type=1)
obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05)

Run the code above in your browser using DataLab