kerndwd: solve Linear DWD and Kernel DWD

Description

Fit the linear distance weighted discrimination (DWD) model and the DWD on Reproducing kernel Hilbert space. The solution path is computed at a grid of values of tuning parameter lambda.

Usage

kerndwd(x, y, kern, qval=1, lambda, wt=NULL, eps=1e-05, maxit=1e+05)

Arguments

A numerical matrix with $N$ rows and $p$ columns for predictors.

A vector of length $p$ for binary responses. The element of y is either -1 or 1.

kern

A kernel function.

qval

The index of the generalized DWD. Default value is 1.

lambda

A user supplied lambda sequence.

A vector of length $n$ for weight factors. When wt=NULL, an unweighted DWD is fitted.

eps

The algorithm stops when (i.e. $\sum_j(\beta_j^{new}-\beta_j^{old})^2$ is less than eps, where $j=0,\ldots, p$. Default value is 1e-5.

maxit

The maximum of iterations allowed. Default is 1e5.

Value

An object with S3 class kerndwd.
alphaA matrix of DWD coefficients at each lambda value. The dimension is (N+1)*length(lambda) in the linear case and (N+1)*length(lambda) in the kernel case.
lambdaThe lambda sequence.
npassTotal number of MM iterations for all lambda values.
jerrWarnings and errors; 0 if none.
infoA list including qval, eps, maxit, kern, and wt if a weight vector was used.
callThe call that produced this object.

Details

Suppose that $V_q(u)=1-u$ if $u \le q/(q+1)$ and $\frac{1}{u^q}\frac{q^q}{(q+1)^(q+1)}$ if $u > q/(q+1)$ is the generalized DWD loss. The value of $\lambda$, i.e., lambda, is user-specified. In the linear case (kern is the inner product and N > p), the kerndwd fits a linear DWD by minimizing the L2 penalized DWD loss function, $$\frac{1}{N}\sum_{i=1}^n V_q(y_i(\beta_0 + X_i'\beta)) + \lambda \beta' \beta.$$ In the kernel case, the kerndwd fits a kernel DWD by minimizing $$\frac{1}{N}\sum_{i=1}^n V_q(y_i(\beta_0 + K_i' \alpha)) + \lambda \alpha' K \alpha,$$ where $K$ is the kernel matrix and $K_i$ is the ith row. The weighted linear DWD and the weighted kernel DWD are formulated as follows, $$\frac{1}{N}\sum_{i=1}^n w_i \cdot V_q(y_i(\beta_0 + X_i'\beta)) + \lambda \beta' \beta,$$ $$\frac{1}{N}\sum_{i=1}^n w_i \cdot V_q(y_i(\beta_0 + K_i' \alpha)) + \lambda \alpha' K \alpha,$$ where $w_i$ is the ith element of wt. The choice of weight factors can be seen in the reference below.

References

Wang, B. and Zou, H. (2015) ``Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS". http://arxiv.org/abs/1508.05913v1.pdf Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004) ``kernlab -- An S4 Package for Kernel Methods in R", Journal of Statistical Software, 11(9), 1--20. http://www.jstatsoft.org/v11/i09/paper Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33(1), 1--22. http://www.jstatsoft.org/v33/i01/paper Marron, J.S., Todd, M.J., and Ahn, J. (2007) ``Distance-Weighted Discrimination"", Journal of the American Statistical Association, 102(408), 1267--1271. https://faculty.franklin.uga.edu/jyahn/sites/faculty.franklin.uga.edu.jyahn/files/DWD3.pdf Qiao, X., Zhang, H., Liu, Y., Todd, M., Marron, J.S. (2010) ``Weighted distance weighted discrimination and its asymptotic properties", Journal of the American Statistical Association, 105(489), 401--414. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996856/

Examples

Run this code

# data setup
data(Haberman)
# check dimensions
dim(Haberman$X); length(Haberman$y)
# standardize the predictors
Haberman$X = scale(Haberman$X, center=TRUE, scale=TRUE)
# a grid of tuning parameters
lambda = 10^(seq(-3, 3, length.out=10))

# fit a linear DWD
kern = vanilladot()
DWD_linear = kerndwd(Haberman$X, Haberman$y, kern, 
  qval=1, lambda=lambda, eps=1e-5, maxit=1e5)

# fit a DWD using Gaussian kernel
kern = rbfdot(sigma=1)
DWD_Gaussian = kerndwd(Haberman$X, Haberman$y, kern, 
  qval=1, lambda=lambda, eps=1e-5, maxit=1e5)

Run the code above in your browser using DataLab