Learn R Programming

MRReg (version 0.1.5)

FindMaxHomoOptimalPartitions: FindMaxHomoOptimalPartitions

Description

FindMaxHomoOptimalPartitions is a main function for inferring optimal homogeneous clusters from a multiresolution dataset DataT.

Usage

FindMaxHomoOptimalPartitions(
  DataT,
  gamma = 0.05,
  insigThs = 1e-08,
  alpha = 0.05,
  minInvs = 99,
  polyDegree = 1,
  expFlag = FALSE,
  messageFlag = FALSE
)

Value

This function returns Copt, models, nNodes, invOptCls, and minR2cv.

Copt

Copt[p,1] is equal to k implies a cluster that is a pth member of the maximal homogeneous partition is at kth layer and the cluster name in kth layer is Copt[p,2] Copt[p,3] is "Model Information Reduction Ratio" I({C},H0,Hlin) of pth member of the maximal homogeneous partition: positive means the linear model is better than the null model. Lastly,Copt[p,4] is the squared correlation between predicted and real Y in CV step ( eta(C)cv ) of pth member of the maximal homogeneous partition. The greater Copt[p,4], the higher homogeneous degree of this cluster.

clustInfoRecRatio

models[[k]][[j]]$clustInfoRecRatio is the "Cluster Information Reduction Ratio" I(Cj,Cjchildren,H) between the jth cluster in kth layer and its children clusters in (k+1)th layer: positive means current cluster is better than its children clusters. Hence, we should keep this cluster at the member of maximal homogeneous partition instead of its children.

models

models[[j]][[k]] is a linear model of a cluster ID k at the layer j. The models[[j]][[k]]$selFeatureSet represents a set of selected-feature indices of the model where the feature index 1 is the intercept, and the feature index d is the (d-1)th variable DataT$X[,d-1].

invOptCls

invOptCls[i,1] is the layer of optimal cluster of individual i. The optimal cluster of i is invOptCls[i,2].

minR2cv

is the value of eta(C)cv from the cluster that has the lowest eta(C)cv.

DataT

is an updated DataT with the helper variables for plotting and printing results.

Arguments

DataT

contains a multiresolution dataset s.t. DataT$X[i,d] is a value of feature d of individual i, DataT$Y[i] is value of target variable of individual i that we want to fit DataT$Y ~ DataT$X in linear model, and clsLayer[i,j] is a cluster ID of individual i at layer j; clsLayer[i,1] is the first layer that everyone typically belongs to a single cluster.

gamma

is a threshold to ...

insigThs

is a threshold to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.

alpha

is a significance level to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature.

minInvs

is a minimum number of individuals for a cluster to be considered for inferring eta(C)cv, otherwise, eta(C)cv=0.

polyDegree

is a degree of polynomial function that is used to fit the data. If it is greater than 1, the polynomial formula is used in lm() instead of "y=.".

expFlag

is an exponential flag to control the formula for data fitting. If it is true, then the exp() formula is used in lm() instead of "y=.".

messageFlag

is a flag. If it is true, the function shows the text regarding the progress of computing.

Examples

Run this code
# Running FindMaxHomoOptimalPartitions using simulation data
DataT<-SimpleSimulation(100,type=1)
obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05)

Run the code above in your browser using DataLab