grcv: General Refitted Cross-Validation Estimator

Description

grcv computes the estimate of the dispersion parameter using the general refitted cross-validation method.

Usage

grcv(object, type = c("BIC", "AIC"), nit = 10L, trace = FALSE,
     control = list(), ...)

Value

grcv returns the estimate of the dispersion parameter.

Arguments

object: fitted dglars object.
type: the measure of goodness-of-fit used in Step 2 to select the two set of variables (see section Description for more details). Default is type = BIC.
control: a list of control parameters passed to the function dglars.
nit: integer specifying the number of times that the general refitted cross-validation method is repeated (see section Description for more details). Default is nit = 10L.
trace: flag used to print out information about the algorithm. Default is trace = FALSE.
...: further arguments passed to the functions AIC.dglars or BIC.dglars.

Author

Luigi Augugliaro and Hassan Pazira
Maintainer: Luigi Augugliaro luigi.augugliaro@unipa.it

Details

The general refitted cross-validation (grcv) estimator (Pazira et al., 2018) is an estimator of the dispersion parameter of the exponential family based on the following four stage procedure:

Step	Description
1.	randomly split the data set \(D = (y, X)\) into two even datasets, denoted by \(D_1\) and \(D_2\).
2.	fit dglars model to the dataset \(D_1\) to select a set of variables \(A_1\).
	fit dglars model to the dataset \(D_2\) to select a set of variables \(A_2\).
3.	fit the glm model to the dataset \(D_1\) using the variables that are in \(A_2\); then estimate the
	disporsion parameter using the Pearson method. Denote by \(\hat{\phi}_1(A_2)\) the resulting estimate.
	fit the glm model to the dataset \(D_2\) using the variables that are in \(A_1\); then estimate the
	disporsion parameter using the Pearson method. Denote by \(\hat{\phi}_2(A_1)\) the resulting estimate.
4.	estimate \(\phi\) using the following estimator: \(\hat{\phi}_{grcv} = (\hat{\phi}_1(A_2) + \hat{\phi}_2(A_1)) / 2\).

In order to reduce the random variabilty due to the splitting of the dataset (Step 1), the previous procedure is repeated ‘nit’-times; the median of the resulting estimates is used as final estimate of the dispersion parameter. In Step 3, the two sets of variables are selected using the AIC.dglars and BIC.dglars; in this step, the Pearson method is used to obtain a first estimate of the dispersion parameter. Furthermore, if the function glm does not converge the function dglars is used to compute the maximum likelihood estimates.

References

Pazira H., Augugliaro L. and Wit E.C. (2018) <doi:10.1007/s11222-017-9761-7> Extended differential-geometric LARS for high-dimensional GLMs with general dispersion parameter, Statistics and Computing, Vol 28(4), 753-774.

Examples

Run this code

############################
# y ~ Gamma
set.seed(321)
n <- 100
p <- 50
X <- matrix(abs(rnorm(n*p)),n,p)
eta <- 1 + 2 * X[,1]
mu <- drop(Gamma()$linkinv(eta))
shape <- 0.5
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)
fit <- dglars(y ~ X, Gamma("log"))

phi
grcv(fit, type = "AIC")
grcv(fit, type = "BIC")

Run the code above in your browser using DataLab