Learn R Programming

DWDLargeR (version 0.2-0)

genDWD: Solve the generalized distance weighted discrimination (DWD) model.

Description

Solve the generalized DWD model by using a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method.

Usage

genDWD(X,y,C,expon, tol = 1e-5, maxIter = 2000, method = 1, printDetails = 0,
             rmzeroFea = 1, scaleFea = 1)

Value

A list consists of the result from the algorithm.

w

The unit normal of hyperplane that distinguishes the two classes.

beta

The distance of the hyperplane to the origin (\(\beta\) in the above formulation).

xi

A slack variable of length \(n\) for the possibility that the two classes may not be separated cleanly by the hyperplane (\(\xi\) in the above formulation).

r

The residual \(r:= Z^T w + \beta y + \xi\).

alpha

Dual variable of the generalized DWD model.

info

A list consists of the information from the algorithm.

runhist

A list consists of the run history throughout the iterations.

Arguments

X

A \(d\) x \(n\) matrix of \(n\) training samples with \(d\) features.

y

A vector of length \(n\) of training labels. The element of y is either -1 or 1.

C

A number representing the penalty parameter for the generalized DWD model.

expon

A positive number representing the exponent \(q\) of the residual \(r_i\) in the generalized DWD model. Common choices are expon = 1,2,4.

tol

The stopping tolerance for the algorithm. (Default = 1e-5)

maxIter

Maximum iteration allowed for the algorithm. (Default = 2000)

method

Method for solving generalized DWD model. The default is set to be 1 for the highly efficient sGS-ADMM algorithm. User can also select method = 2 for the directly extended ADMM solver.

printDetails

Switch for printing details of the algorithm. Default is set to be 0 (not printing).

rmzeroFea

Switch for removing zero features in the data matrix. Default is set to be 1 (removing zero features).

scaleFea

Switch for scaling features in the data matrix. This is to make the features having roughly similar magnitude. Default is set to be 1 (scaling features).

Author

Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh

Details

This is a symmetric Gauss-Seidel based alternating method of multipliers (sGS-ADMM) algorithm for solving the generalized DWD model of the following formulation: $$\min \sum_i \theta_q (r_i) + C e^T x_i$$ subject to the constraints $$Z^T w + \beta y + \xi - r = 0, ||w||<=1, \xi>=0,$$
where \(Z = X diag(y)\), \(e\) is a given positive vector such that \(||e||_\infty = 1\), and \(\theta_q\) is a function defined by \(\theta_q(t) = 1/t^q\) if \(t>0\) and \(\theta_q(t)=\infty\) if \(t<=0\).

References

Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018) ``Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473

Examples

Run this code
# load the data
data("mushrooms")
# calculate the best penalty parameter
C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1)
# solve the generalized DWD model
result = genDWD(mushrooms$X,mushrooms$y,C,expon=1)

Run the code above in your browser using DataLab