Apply a single-index \(SIR\) on \((X,Y)\) with \(H\) slices, with a soft/hard thresholding
of the interest matrix \(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\) by an optimal
parameter \(\lambda_{opt}\). The \(\lambda_{opt}\) is found automatically among a vector
of n_lambda
\(\lambda\), starting from 0 to the maximum value of
\(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\). For each feature of \(X\),
the number of \(\lambda\) associated with a selection of this feature is stored
(in a vector of size \(p\)). This vector is sorted in a decreasing way. Then, thanks to
strucchange::breakpoints
, a breakpoint is found in this sorted vector. The coefficients
of the variables at the left of the breakpoint, tend to be automatically toggled to 0 due
to the thresholding operation based on \(\lambda_{opt}\), and so should be removed (useless
variables). Finally, \(\lambda_{opt}\) corresponds to the first \(\lambda\) such that the
associated \(\hat{b}\) provides the same number of zeros as the breakpoint's value.
For example, for \(X \in R^{10}\) and n_lambda=100
, this sorted vector can look like this :
X10 | X3 | X8 | X5 | X7 | X9 | X4 | X6 | X2 | X1 |
2 | 3 | 3 | 4 | 4 | 4 | 6 | 10 | 95 | 100 |
Here, the breakpoint would be 8.
SIR_threshold_opt(
Y,
X,
H = 10,
n_lambda = 100,
thresholding = "hard",
graph = TRUE,
output = TRUE,
choice = ""
)
An object of class SIR_threshold_opt, with attributes:
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix.
A vector that contains the tested lambdas.
The optimal lambda.
A matrix of size p*n_lambda that contains an estimation of beta in the columns for each lambda.
The number of lambda tested.
The number of 0 in b for each lambda.
A list that contains the variables selected by the model.
An object of class breakpoints from the strucchange package, that contains informations about the breakpoint which allows to deduce the optimal lambda.
A vector that contains p items: each variable is associated with the number of lambda that selects this variable.
A vector that contains for each lambda, the cosine squared between vanilla SIR and SIR thresholded.
The response vector.
Sample size.
The number of variables in X.
The chosen number of slices.
The interest matrix thresholded with the optimal lambda.
The thresholding method used.
Unevaluated call to the function.
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b.
The index Xb' estimated by SIR.
A numeric vector representing the dependent variable (a response vector).
A matrix representing the quantitative explanatory variables (bind by column).
The chosen number of slices (default is 10).
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix. (default is 100).
The thresholding method to choose between hard and soft (default is hard).
A boolean, set to TRUE to plot graphs (default is TRUE).
A boolean, set to TRUE to print informations (default is TRUE).
the graph to plot:
"estim_ind" Plot the estimated index by the SIR model versus Y.
"opt_lambda" Plot the choice of the optimal lambda.
"cos2_selec" Plot the evolution of cos^2 and variable selection according to lambda.
"regul_path" Plot the regularization path of b.
"" Plot every graphs (default).
# Generate Data
set.seed(2)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with soft thresholding
SIR_threshold_opt(Y,X,H=10,n_lambda=300,thresholding="soft")
Run the code above in your browser using DataLab