smoothic: Variable Selection Using a Smooth Information Criterion (SIC)

Description

Implements the SIC \(\epsilon\)-telescope method, either using single or multiparameter regression. Returns estimated coefficients, estimated standard errors and the value of the penalized likelihood function. Note that the function will scale the predictors to have unit variance, however, the final estimates are converted back to their original scale.

Usage

smoothic(
  formula,
  data,
  family = "sgnd",
  model = "mpr",
  lambda = "log(n)",
  epsilon_1 = 10,
  epsilon_T = 1e-04,
  steps_T = 100,
  zero_tol = 1e-05,
  max_it = 10000,
  kappa,
  tau,
  max_it_vec,
  stepmax_nlm
)

Value

A list with estimates and estimated standard errors.

coefficients - vector of coefficients.
see - vector of estimated standard errors.
model - the matched type of model which is called.
plike - value of the penalized likelihood function.
kappa - value of the estimated/fixed shape parameter kappa if family = "sgnd".

Arguments

formula: An object of class "formula": a two-sided object with response on the left hand side and the model variables on the right hand side.
data: A data frame containing the variables in the model; the data frame should be unstandardized.
family: The family of the model, default is family = "sgnd" for the "Smooth Generalized Normal Distribution" where the shape parameter kappa is also estimated. Classical regression with normally distributed errors is performed when family = "normal". If family = "laplace", this corresponds to a robust regression with errors from a Laplace-like distribution. If family = "laplace", then the default value of tau = 0.15, which is used to approximate the absolute value in the Laplace density function.
model: The type of regression to be implemented, either model = "mpr" for multiparameter regression (i.e., location and scale), or model = "spr" for single parameter regression (i.e., location only). Defaults to model="mpr".
lambda: Value of penalty tuning parameter. Suggested values are "log(n)" and "2" for the BIC and AIC respectively. Defaults to lambda ="log(n)" for the BIC case. This is evaluated as an R expression, so it may be a number of some function of n.
epsilon_1: Starting value for \(\epsilon\)-telescope. Defaults to 10.
epsilon_T: Final value for \(\epsilon\)-telescope. Defaults to 1e-04.
steps_T: Number of steps in \(\epsilon\)-telescope. Defaults to 100, must be greater than or equal to 10.
zero_tol: Coefficients below this value are treated as being zero. Defaults to 1e-05.
max_it: Maximum number of iterations to be performed before the optimization is terminated. Defaults to 1e+04.
kappa: Optional user-supplied positive kappa value (> 0.2 to avoid computational issues) if family = "sgnd". If supplied, the shape parameter kappa will be fixed to this value in the optimization. If not supplied, kappa is estimated from the data.
tau: Optional user-supplied positive smoothing parameter value in the "Smooth Generalized Normal Distribution" if family = "sgnd" or family = "laplace". If not supplied then tau = 0.15. If family = "normal" then tau = 0 is used. Smaller values of tau bring the approximation closer to the absolute value function, but this can cause the optimization to become unstable. Some issues with standard error calculation with smaller values of tau when using the Laplace distribution in the robust regression setting.
max_it_vec: Optional vector of length steps_T that contains the maximum number of iterations to be performed in each \(\epsilon\)-telescope step. If not supplied, max_it is the maximum number of iterations performed for 10 steps and then the maximum number of iterations to be performed reduces to 10 for the remainder of the telescope.
stepmax_nlm: Optional maximum allowable scaled step length (positive scalar) to be passed to nlm. If not supplied, default values in nlm are used.

Author

Meadhbh O'Neill

References

O'Neill, M. and Burke, K. (2023) Variable selection using a smooth information criterion for distributional regression models. <doi:10.1007/s11222-023-10204-8>

O'Neill, M. and Burke, K. (2022) Robust Distributional Regression with Automatic Variable Selection. <arXiv:2212.07317>

Examples

Run this code

# Sniffer Data --------------------
# MPR Model ----
results <- smoothic(
  formula = y ~ .,
  data = sniffer,
  family = "normal",
  model = "mpr"
)
summary(results)

Run the code above in your browser using DataLab