npsdr: A unified Principal sufficient dimension reduction method via kernel trick

Description

This function extends principal SDR to nonlinear relationships between predictors and the response using a kernel feature map. The kernel basis is constructed internally using a data-driven number of basis functions, and the working matrix is formed analogously to linear principal SDR but in the transformed feature space.

Users may choose from built-in loss functions or provide a custom loss through the same interface as psdr(). The method supports both continuous and binary responses and can visualize the nonlinear sufficient predictors.

The output contains the kernel basis object, the working matrix M, eigenvalues and eigenvectors, and detailed fitting metadata.

Usage

npsdr(
  x,
  y,
  loss = "svm",
  h = 10,
  lambda = 1,
  b = floor(length(y)/3),
  eps = 1e-05,
  max.iter = 100,
  eta = 0.1,
  mtype = "m",
  plot = TRUE
)

Value

An object of class "npsdr" containing:

x, y: input data
M: working matrix
evalues, evectors: eigen-decomposition of M
obj.psi: kernel basis object from get.psi()
fit: metadata (loss, h, lambda, eps, max.iter, eta, b, response.type, cutpoints, weight_cutpoints)

Arguments

x: data matrix
y: either continuous or (+1,-1) typed binary response vector
loss: pre-specified loss functions belongs to "svm", "logit", "l2svm", "wsvm", "qr", "asls", "wlogit", "wl2svm", "lssvm", "wlssvm", and user-defined loss function object also can be used formed by inside double (or single) quotation mark. Default is 'svm'.
h: unified control for slicing or weighting; accepts either an integer or a numeric vector.
lambda: hyperparameter for the loss function. default value is 1
b: number of basis functions for a kernel trick, floor(length(y)/3) is default
eps: threshold for stopping iteration with respect to the magnitude of derivative, default value is 1.0e-4
max.iter: maximum iteration number for the optimization process. default value is 30
eta: learning rate for gradient descent method. default value is 0.1
mtype: type of margin, either "m" or "r" refer margin and residual, respectively (See, Table 1 in the pacakge manuscript). When one use user-defined loss function this argument should be specified. Default is "m".
plot: If TRUE then it produces scatter plots of \(Y\) versus the first sufficient predictor. The default is FALSE.

Author

Jungmin Shin, c16267@gmail.com, Seung Jun Shin, sjshin@korea.ac.kr, Andreas Artemiou artemiou@uol.ac.cy

References

Artemiou, A. and Dong, Y. (2016) Sufficient dimension reduction via principal lq support vector machine, Electronic Journal of Statistics 10: 783–805.
Artemiou, A., Dong, Y. and Shin, S. J. (2021) Real-time sufficient dimension reduction through principal least squares support vector machines, Pattern Recognition 112: 107768.
Kim, B. and Shin, S. J. (2019) Principal weighted logistic regression for sufficient dimension reduction in binary classification, Journal of the Korean Statistical Society 48(2): 194–206.
Li, B., Artemiou, A. and Li, L. (2011) Principal support vector machines for linear and nonlinear sufficient dimension reduction, Annals of Statistics 39(6): 3182–3210.
Soale, A.-N. and Dong, Y. (2022) On sufficient dimension reduction via principal asymmetric least squares, Journal of Nonparametric Statistics 34(1): 77–94.
Wang, C., Shin, S. J. and Wu, Y. (2018) Principal quantile regression for sufficient dimension reduction with heteroscedasticity, Electronic Journal of Statistics 12(2): 2114–2140.
Shin, S. J., Wu, Y., Zhang, H. H. and Liu, Y. (2017) Principal weighted support vector machines for sufficient dimension reduction in binary classification, Biometrika 104(1): 67–81.
Li, L. (2007) Sparse sufficient dimension reduction, Biometrika 94(3): 603–613.

Examples

Run this code

# \donttest{
set.seed(1)
n <- 200;
p <- 5;
x <- matrix(rnorm(n*p, 0, 2), n, p)
y <- 0.5*sqrt((x[,1]^2+x[,2]^2))*(log(x[,1]^2+x[,2]^2))+ 0.2*rnorm(n)
obj_kernel <- npsdr(x, y, plot=FALSE)
print(obj_kernel)
summary(obj_kernel)
plot(obj_kernel)

# }

Run the code above in your browser using DataLab