Learn R Programming

prclust (version 1.0)

GCV: Calculate the Generalized Cross-Validation Statistic (GCV)

Description

Calculate the generalized cross-validation statistic with generalized degrees of freedom.

Usage

GCV(data,lambda1,lambda2,tau,sigma,B=100,
	loss.method = c("quadratic","lasso"),
	group.method = c("gtlp","lasso","SCAD","MCP"), 
	algorithm = c("ADMM","Quadratic"), epsilon =0.001)

Arguments

data
Numeric data matrix .
lambda1
Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM.
lambda2
Tuning parameter: lambda2, the magnitude of grouping penalty.
tau
Tuning parameter: tau, related to grouping penalty.
sigma
The perturbation size.
B
The Monte Carlo time. The defualt value is 100.
loss.method
character may be abbreviated. "lasso" stands for $L_1$ loss function, while "quadratic" stands for the quadratic loss function.
group.method
character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty.
algorithm
character: may be abbreviated. The algorithm will use for finding the solution. The default algorithm is "ADMM", which stands for the DC-ADMM.
epsilon
The stopping critetion parameter. The default is 0.001.

Value

  • Return value: the Generalized cross-validation statistic (GCV)

Details

A bonus with the regression approach to clustering is the potential application of many existing model selection methods for regression or supervised learning to clustering. We propose using generalized cross-validation (GCV). GCV can be regarded as an approximation to leave-one-out cross-validation (CV). Hence, GCV provides an approximately unbiased estimate of the prediction error.

We use the generalized degrees of freedom (GDF) to consider the data-adaptive nature in estimating the centroids of the observations.

The chosen tuning parameters are the one giving the smallest GCV error.

References

Pan Wei, Xiaotong Shen, and Binghui Liu. "Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty." The Journal of Machine Learning Research 14.1 (2013):1865-1889.

Examples

Run this code
set.seed(1)
library("prclust")
data = matrix(NA,2,50)
data[1,1:25] = rnorm(25,0,0.33)
data[2,1:25] = rnorm(25,0,0.33)
data[1,26:50] = rnorm(25,1,0.33)
data[2,26:50] = rnorm(25,1,0.33)

#case 1
gcv1 = GCV(data,lambda1=1,lambda2=1,tau=0.5,sigma=0.25,B =10)
gcv1

#case 2
gcv2 = GCV(data,lambda1=1,lambda2=0.7,tau=0.3,sigma=0.25,B = 10)
gcv2

# Note that the combination of tuning parameters in case 1 are better than 
# the combination of tuning parameters in case 2 since the value of GCV in case 1 is
# less than the value in case 2.

Run the code above in your browser using DataLab