MGLMtune: Choose the tuning parameter value in sparse regression

Description

Finds the tuning parameter value that yields the smallest BIC.

Usage

MGLMtune(
  formula,
  data,
  dist,
  penalty,
  lambdas,
  ngridpt,
  warm.start = TRUE,
  keep.path = FALSE,
  display = FALSE,
  init,
  weight,
  penidx,
  ridgedelta,
  maxiters = 150,
  epsilon = 1e-05,
  regBeta = FALSE,
  overdisp
)

Arguments

formula

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. The response has to be on the left hand side of ~.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data when using function MGLMtune, the variables are taken from environment(formula), typically the environment from which MGLMtune is called.

dist

a description of the distribution to fit. See dist for the details.

penalty

penalty type for the regularization term. Can be chosen from "sweep", "group", or "nuclear". See MGLMsparsereg for the description of each penalty type.

lambdas

an optional vector of the penalty values to tune. If missing, the vector of penalty values will be set inside the function. ngridpt must be provided if lambdas is missing.

ngridpt

an optional numeric variable specifying the number of grid points to tune. If lambdas is given, ngridpt will be ignored. Otherwise, the maximum \(\lambda\) is determined from the data. The smallest \(\lambda\)is set to \(1/n\), where \(n\) is the sample size.

warm.start

an optional logical variable to specify whether to give warm start at each tuning grid point. If warm.start=TRUE, the fitted sparse regression coefficients will be used as the initial value when fitting the sparseregression with the next tuning grid.

keep.path

an optional logical variable controling whether to output the whole solution path. The default is keep.path=FALSE. If keep.path=TRUE, the sparse regression result at each grid point will be kept, and saved in the output object select.list.

display

an optional logical variable to specify whether to show each tuning step.

init

an optional matrix of initial value of the parameter estimates. Should have the compatible dimension with the data. See dist for details of dimensions in each distribution.

weight

an optional vector of weights assigned to each row of the data. Should be NULL or a numeric vector. Could be a variable from the data, or a variable from environment(formula) with the length equal to the number of rows of the data. If weight=NULL, equal weights of ones will be assigned.

penidx

a logical vector indicating the variables to be penalized. The default value is rep(TRUE, p), which means all predictors are subject to regularization. If X contains intercept, use penidx=c(FALSE,rep(TRUE,p-1)).

ridgedelta

an optional numeric controlling the behavior of the Nesterov's accelerated proximal gradient method. The default value is \(\frac{1}{pd}\).

maxiters

an optional numeric controlling the maximum number of iterations. The default value is maxiters=150.

epsilon

an optional numeric controlling the stopping criterion. The algorithm terminates when the relative change in the objective values of two successive iterates is less then epsilon. The default value is epsilon=1e-5.

regBeta

an optional logical variable used when running negative multinomial regression (dist="NegMN"). regBeta controls whether to run regression on the over-dispersion parameter. The default is regBeta=FALSE.

overdisp

an optional numerical variable used only when fitting sparse negative multinomial model and regBeta=FALSE. overdisp gives the over-dispersion value for all the observations. The default value is estimated using negative-multinomial regression. When dist="MN", "DM", "GDM" or regBeta=TRUE, the value of overdisp is ignored.

Value

select the final sparse regression result, using the optimal tuning parameter.
path a data frame with degrees of freedom and BICs at each lambda.

Examples

Run this code

# NOT RUN {
set.seed(118)
n <- 50
p <- 10
d <- 5
m <- rbinom(n, 100, 0.8)
X <- matrix(rnorm(n * p), n, p)
alpha <- matrix(0, p, d)
alpha[c(1, 3, 5), ] <- 1
Alpha <- exp(X %*% alpha)
Y <- rdirmn(size=m, alpha=Alpha)
sweep <- MGLMtune(Y ~ 0 + X, dist="DM", penalty="sweep", ngridpt=10)
show(sweep)


# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples