This generic function fits a linear regression model via generalized cross entropy. Initial support spaces can be provided or computed.
lmgce(
formula,
data,
subset,
na.action,
offset,
contrasts = NULL,
model = TRUE,
x = FALSE,
y = FALSE,
cv = TRUE,
cv.nfolds = 5,
errormeasure = c("RMSE", "MSE", "MAE", "MAPE", "sMAPE", "MASE"),
errormeasure.which = {
if (isTRUE(cv))
c("1se", "min", "elbow")
else c("min", "elbow")
},
support.method = c("standardized", "ridge"),
support.method.penalize.intercept = TRUE,
support.signal = NULL,
support.signal.vector = NULL,
support.signal.vector.min = 0.3,
support.signal.vector.max = 20,
support.signal.vector.n = 20,
support.signal.points = c(1/5, 1/5, 1/5, 1/5, 1/5),
support.noise = NULL,
support.noise.points = c(1/3, 1/3, 1/3),
weight = 0.5,
twosteps.n = 1,
method = c("dual.BFGS", "dual.lbfgsb3c", "dual", "primal.solnl", "primal.solnp",
"dual.CG", "dual.L-BFGS-B", "dual.Rcgmin", "dual.bobyqa", "dual.newuoa",
"dual.nlminb", "dual.nlm", "dual.lbfgs", "dual.optimParallel"),
caseGLM = c("D", "M", "NM"),
boot.B = 0,
boot.method = c("residuals", "cases", "wild"),
seed = 230676,
OLS = TRUE,
verbose = 0
)lmgce returns an object of class
lmgce.
The function summary.lmgce is used to obtain and print a
summary of the results. The generic accessory functions
coef.lmgce, fitted.values.lmgce,
residuals.lmgce and df.residual.lmgce, extract
various useful features of the value returned by object of class
lmgce.
An object of class
lmgce is a list containing at
least the following components:
a named vector of coefficients.
the residuals, that is response minus fitted values.
the fitted mean values.
the residual degrees of freedom.
the matched call.
the terms object used.
(only where relevant) the contrasts used.
(only where relevant) a record of the levels of the factors used in fitting.
the offset used (missing if none were used).
if requested (the default), the response used.
if requested (the default), the model matrix used.
if requested (the default), the model frame used.
(where relevant) information returned by
model.frame on the special handling of NAs.
number of bootstrap replicates used.
method used for bootstrapping.
case of the generic general linear model used.
an integer code. 0 indicates successful
optimization completion. Other numbers indicate different errors. See
optim, optimx,
solnl, solnp,
lbfgs) and lbfgsb3c).
loss function (error) used for the selection of the support spaces.
in sample error for the selected support space.
cross-validation mean error for the selected support space.
standard deviation of the cross-validation error for the selected support space.
which criterion/standardized/factor support was used
upper limit of the standardized support space or factor that produced the error within one standard error from the minimum error.
upper limit of the standardized support space or factor that produced the error correspondent to the elbow of the error curve.
upper limit of the standardized support space or factor that produced the minimum error.
vector of prior weights used for the signal.
estimated probabilities associated with the signal.
vector of prior weights used for the noise.
estimated probabilities associated with the noise.
estimated Lagrange multipliers.
normalized entropy of the signal of the model.
cross-validation normalized entropy of the signal of the model.
standard deviation of the cross-validation normalized entropy of the signal of the model.
normalized entropy of the signal of each coefficient.
results from the different support spaces with or without
cross-validation, and from bootstrap replicates, namely number of attempts
(if the number of attempts is greater than three times the
number of bootstrap replicates the bootstrapping process stops), coefficients
and normalized entropies (nep - model, and nepk - coefficients), when
applicable; results from OLS estimation if OLS = TRUE; results from
GCE reestimation if twosteps.n is greater than 0.
vector of given positive upper limits for the
support spaces on standardized data or factors, when
support.signal = NULL or support.signal = L, or
"interval" otherwise.
matrix with the support spaces used for estimation on original data.
method chosen for the support's limits
vector of successful positive upper limits for the
support spaces on standardized data (support.method = "standardized")
or factors (support.method = "ridge"), when support.signal = NULL
or support.signal = L, or "interval" otherwise.
when applicable, the upper limit of the standardized
support chosen, when support.method = "standardized" or the factor used
when support.method = "ridge".
variance-covariance matrix of the coefficients.
An object of class formula (or one that
can be coerced to that class): a symbolic description of the model to be
fitted.
A data frame (or object coercible by
as.data.frame to a data frame) containing the variables
in the model.
an optional vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen when the data
contain NAs. The default is set by the na.action setting of
options, and is na.fail if that is
unset. The ‘factory-fresh’ default is na.omit. Another
possible value is NULL, no action. Value
na.exclude can be useful.
this can be used to specify an a priori known component to be
included in the linear predictor during fitting. This should be NULL
or a numeric vector or matrix of extents matching those of the response. One
or more offset terms can be included in the formula
instead or as well, and if more than one are specified their sum is used.
See model.offset.
An optional list. See the contrasts.arg of
model.matrix.default.
Boolean value. if TRUE, the model frame used is returned.
The default is model = TRUE.
Boolean value. if TRUE, the model matrix used is returned.
The default is x = FALSE.
Boolean value. if TRUE, the response used is returned.
The default is y = FALSE.
Boolean value. If TRUE the error, errormeasure,
will be computed using cross-validation. If FALSE the error will be
computed in sample. The default is cv = TRUE.
number of folds used for cross-validation when
cv = TRUE. The default is cv.nfolds = 5 and the smallest value
allowable is cv.nfolds = 3.
Loss function (error) to be used for the selection
of the support spaces. One of c("RMSE","MSE", "MAE", "MAPE", "sMAPE", "MASE").
The default is errormeasure = "RMSE".
Which value of errormeasure
to be used for selecting a support space upper limit from support.signal.vector.
One of c("min", "1se", "elbow") where "min" corresponds to the
support spaces that produced the lowest error, "1se" corresponds to
the support spaces such that error is within 1 standard error of the CV error
for "min" and "elbow" corresponds to the elbow point of the error
curve (the point that maximizes the distance between each observation, i.e,
the pair composed by the upper limit of the support space and the error, and
the line between the first and last observations, i.e., the lowest and the
highest upper limits of the support space respectively. See
find_curve_elbow). The default is
errormeasure.which = "1se".
One of c("standardized", "ridge"). If
support.method = "standardized}, the default, standardized coefficients
are used to define the signal support spaces. If
\code{support.method = "ridge the signal support spaces are define by the
ridge trace.
Boolean value. if TRUE,
the default, the intercept will be penalized. To be used when
support.method = "ridge".
NULL or fixed positive upper limit (L) for the
support spaces (-L,L) on standardized data (when
support.method = "standardized"); NULL or fixed positive factor
to be multiplied by the maximum absolute value of the ridge trace for each
coefficient (when support.method = "ridge"); a pair (LL,UL) or a
matrix ((k+1) x 2) for the support spaces on original data. The default is
support.signal = NULL.
NULL or a vector of positive values when
support.signal = NULL. If support.signal.vector = NULL,
the default, a vector
c(support.signal.vector.min,...,support.signal.vector.max) of dimension
support.signal.vector.n and logarithmically equally spaced will be
generated. Each value represents the upper limits for the standardized support
spaces, when support.method = "standardized" or the factor to be
multiplied by the maximum absolute value of the ridge trace for each
coefficient, when support.method = "ridge".
A positive value for the lowest limit of the
support.signal.vector when support.signal = NULL and
support.signal.vector = NULL. The default is
support.signal.vector.min = 0.3.
A positive value for the highest limit of the
support.signal.vector when support.signal = NULL and
support.signal.vector = NULL. The default is
support.signal.vector.max = 20.
A positive integer for the number of support
spaces to be used when support.signal = NULL and
support.signal.vector = NULL. The default is
support.signal.vector.n = 20.
A positive integer, a vector or a matrix. Prior
weights for the signal. If not a positive integer then the sum of weights by
row must be equal to 1. The default is
support.signal.points = c(1 / 5, 1 / 5, 1 / 5, 1 / 5, 1 / 5).
An interval, preferably centered around zero, given in
the form c(LL,UL). If support.noise = NULL, the default, then a
vector c(-L,L) is computed using the empirical three-sigma rule
Pukelsheim (1994).
A positive integer, a vector or a matrix. Prior
weights for the noise. If not a positive integer then the sum of weights by
row must be equal to 1. The default is
support.noise.points = c(1 / 3, 1 / 3, 1 / 3).
a value between zero and one representing the
prediction-precision loss trade-off. If weight = 0.5, the default,
equal weight is placed on the signal and noise entropies. A higher than 0.5
value places more weight on the noise entropy whereas a lower than 0.5 value
places more weight on the signal entropy.
Number of GCE reestimations using a previously estimated vector of signal probabilities.
Use "primal.solnl" (GCE using Sequential Quadratic
Programming (SQP) method; see solnl) or
"primal.solnp" (GCE using the augmented Lagrange multiplier method
with an SQP interior algorithm; see solnp) for primal
form of the optimization problem and "dual" (GME), "dual.CG"
(GCE using a conjugate gradients method; see optim),
"dual.BFGS" (GCE using Broyden-Fletcher-Goldfarb-Shanno quasi-Newton
method; see optim), "dual.L-BFGS-B" (GCE using a
box-constrained optimization with limited-memory modification of the BFGS
quasi-Newton method; see optim), dual.Rcgmin
(GCE using an update of the conjugate gradient algorithm; see
optimx),
dual.bobyqa (GCE using a derivative-free optimization by quadratic
approximation; see optimx and
bobyqa), dual.newuoa (GCE using a
derivative-free optimization by quadratic approximation; see
optimx and newuoa),
dual.nlminb (GCE; see optimx and
nlminb), dual.nlm (GCE; see
optimx and nlm),
dual.lbfgs (GCE using the Limited-memory
Broyden-Fletcher-Goldfarb-Shanno; see lbfgs),
dual.lbfgsb3c (GCE using L-BFSC-B implemented in Fortran code and with
an Rcpp interface; see lbfgsb3c) or
dual.optimParallel (GCE using parallel version of the L-BFGS-B; see
optimParallel) for dual form. The
default is method = "dual.BFGS".
special cases of the generic general linear model. One of
c("D", "M", "NM"), where "D" stands for data, "M" for moment and
"NM" for normed-moment The default is caseGLM = "D".
A single positive integer greater or equal to 10 for the number
of bootstrap replicates to be used for the computation of the bootstrap
confidence interval(s). Zero value will generate no replicate. The default
is boot.B = 0.
Method to be use for bootstrapping. One of
c("residuals", "cases", "wild") which corresponds to resampling on
residuals, on individual cases or on residuals multiplied by a N(0,1) variable,
respectively. The default is boot.method = "residuals".
A single value, interpreted as an integer, for reproducibility
or NULL for randomness. The default is seed = 230676.
Boolean value. if TRUE, the default, OLS estimation is
performed.
An integer to control how verbose the output is. For a value
of 0 no messages or output are shown and for a value of 3 all messages
are shown. The default is verbose = 0.
Jorge Cabral, jorgecabral@ua.pt
The lmgce function fits a linear regression model via generalized cross
entropy. Models for lmgce are specified symbolically. A typical model has the
form response ~ terms where response is the (numeric) response vector and
terms is a series of terms which specifies a linear predictor for response.
lmgce calls the lower level functions lmgce.validate,
lmgce.assign.ci, lmgce.assign.noci, lmgce.sscv,
lmgce.ss, lmgce.cv and lmgce.fit.
Golan, A., Judge, G. G. and Miller, D. (1996)
Maximum entropy econometrics : robust estimation with limited data.
Wiley.
Golan, A. (2008).
Information and Entropy Econometrics — A Review and Synthesis.
Foundations and Trends® in Econometrics, 2(1–2), 1–145.
tools:::Rd_expr_doi("10.1561/0800000004")
Golan, A. (2017)
Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information (Vol. 1).
Oxford University Press.
tools:::Rd_expr_doi("10.1093/oso/9780199349524.001.0001")
Pukelsheim, F. (1994)
The Three Sigma Rule.
The American Statistician, 48(2), 88–91.
tools:::Rd_expr_doi("10.2307/2684253")
Macedo, P., Cabral, J., Afreixo, V., Macedo, F., Angelelli, M. (2025)
RidGME estimation and inference in ill-conditioned models.
In: Gervasi O, Murgante B, Garau C, et al., eds. Computational Science and
Its Applications – ICCSA 2025 Workshops. Springer Nature Switzerland; 2025:300-313.
tools:::Rd_expr_doi("10.1007/978-3-031-97589-9_21")
summary.lmgce for more detailed summaries.
The generic functions plot.lmgce, print.lmgce,
coef.lmgce and confint.lmgce.
# \donttest{
res_gce_package <-
lmgce(y ~ .,
data = dataGCE,
boot.B = 50,
seed = 230676)
# }
res_gce_package
Run the code above in your browser using DataLab