psdr: Unified linear principal sufficient dimension reduction methods

Description

This function implements a unified framework for linear principal SDR methods. It provides a single interface that covers many existing principal-machine approaches, such as principal SVM, weighted SVM, logistic, quantile, and asymmetric least squares SDR. The method estimates the central subspace by constructing a working matrix M derived from user-specified loss functions, slicing or weighting schemes, and regularization.

The function is designed for both continuous responses and binary classification (with any two-level coding). Users may choose among several built-in loss functions or supply a custom loss function. Two examples of the usage of user-defined losses are presented below (u represents a margin):

mylogit <- function(u, ...) log(1+exp(-u)),

myls <- function(u ...) u^2.

Argument u is a function variable (any character is possible) and the argument mtype for psdr() determines a type of a margin, either (type="m") or (type="r") method. type="m" is a default. Users have to change type="r", when applying residual type loss. Any additional parameters of the loss can be specified via ... argument.

The output includes the estimated eigenvalues and eigenvectors of M, which form the basis of the estimated central subspace, as well as detailed metadata used to summarize model fitting and diagnostics.

Usage

psdr(
  x,
  y,
  loss = "svm",
  h = 10,
  lambda = 1,
  eps = 1e-05,
  max.iter = 100,
  eta = 0.1,
  mtype = "m",
  plot = FALSE
)

Value

An object of S3 class "psdr" containing

M: working matrix
evalues, evectors: eigen decomposition of M
fit: metadata (n, p, ytype, hyperparameters, per-slice iteration/convergence info)

Arguments

x: input matrix, of dimension nobs x nvars; each row is an observation vector.
y: response variable, either continuous or binary (any 2-level coding; e.g., -1/1, 0/1, 1/2, TRUE/FALSE, factor/character).
loss: pre-specified loss functions belongs to "svm", "logit", "l2svm", "wsvm", "qr", "asls", "wlogit", "wl2svm", "lssvm", "wlssvm", and user-defined loss function object also can be used formed by inside double (or single) quotation mark. Default is 'svm'.
h: unified control for slicing or weighting; accepts either an integer or a numeric vector.
lambda: regularization parameter (default 1).
eps: convergence threshold on parameter change (default 1e-5).
max.iter: maximum number of iterations (default 100).
eta: learning rate for gradient descent (default 0.1).
mtype: a margin type, which is either margin ("m") or residual ("r") (See, Table 1 in the manuscript). Only need when user-defined loss is used. Default is "m".
plot: logical; if TRUE, produces diagnostic plot.

Author

Jungmin Shin, c16267@gmail.com, Seung Jun Shin, sjshin@korea.ac.kr, Andreas Artemiou artemiou@uol.ac.cy

References

Artemiou, A. and Dong, Y. (2016) Sufficient dimension reduction via principal lq support vector machine, Electronic Journal of Statistics 10: 783–805.
Artemiou, A., Dong, Y. and Shin, S. J. (2021) Real-time sufficient dimension reduction through principal least squares support vector machines, Pattern Recognition 112: 107768.
Kim, B. and Shin, S. J. (2019) Principal weighted logistic regression for sufficient dimension reduction in binary classification, Journal of the Korean Statistical Society 48(2): 194–206.
Li, B., Artemiou, A. and Li, L. (2011) Principal support vector machines for linear and nonlinear sufficient dimension reduction, Annals of Statistics 39(6): 3182–3210.
Soale, A.-N. and Dong, Y. (2022) On sufficient dimension reduction via principal asymmetric least squares, Journal of Nonparametric Statistics 34(1): 77–94.
Wang, C., Shin, S. J. and Wu, Y. (2018) Principal quantile regression for sufficient dimension reduction with heteroscedasticity, Electronic Journal of Statistics 12(2): 2114–2140.
Shin, S. J., Wu, Y., Zhang, H. H. and Liu, Y. (2017) Principal weighted support vector machines for sufficient dimension reduction in binary classification, Biometrika 104(1): 67–81.
Li, L. (2007) Sparse sufficient dimension reduction, Biometrika 94(3): 603–613.

Examples

Run this code

# \donttest{
## ----------------------------
## Linear PM
## ----------------------------
set.seed(1)
n <- 200; p <- 5;
x <- matrix(rnorm(n*p, 0, 2), n, p)
y <-  x[,1]/(0.5 + (x[,2] + 1)^2) + 0.2*rnorm(n)
y.tilde <- sign(y)
obj <- psdr(x, y)
print(obj)
plot(obj, d=2)

## --------------------------
## User defined cutoff points
## --------------------------
obj_cut <- psdr(x, y, h = c(0.1, 0.3, 0.5, 0.7))
print(obj_cut)

## --------------------------------
## Linear PM (Binary classification)
## --------------------------------
obj_wsvm <- psdr(x, y.tilde, loss="wsvm")
plot(obj_wsvm)

## ----------------------------
## User-defined loss function
## ----------------------------
mylogistic <- function(u) log(1+exp(-u))
psdr(x, y, loss="mylogistic")

## ----------------------------
## Real-data example: iris (binary subset)
## ----------------------------
iris_binary <- droplevels(subset(iris, Species %in% c("setosa", "versicolor")))
psdr(x = iris_binary[, 1:4], y = iris_binary$Species, plot = TRUE)
# }

Run the code above in your browser using DataLab