pmse: Prediction Mean Squared Error in a Beta Regression on a Bayesian Model

Description

A function that selects a part of the database to fit a beta regression model and another part of this database to test the built model, returning the PMSE (prediction mean squared error) that reports the quality of the estimation for that database. In addition, the function also contains all the information that the bayesbr function returns, making it possible to do all analyzes on the fitted model.

Usage

pmse(formula = NULL, data = NULL, test.set = 0.3,
na.action = c("exclude", "replace"),mean_betas = NULL,
   variance_betas = NULL,mean_gammas = NULL, variance_gammas = NULL,
    iter = 10000, warmup = iter/2,chains = 1, pars = NULL,
     a = NULL, b = NULL, resid.type = c("quantile",
     "sweighted", "pearson", "ordinary"), ...)

Arguments

formula

symbolic description of the model (of type y ~ x or y ~ x | z;). See more at formula

data

data frame or list with the variables passed in the formula parameter, if data = NULL the function will use the existing variables in the global environment.

test.set

Defines the proportion of the database that will be used for testing the adjusted model and calculating the PMSE. The rest of the database will be used for modeling. Test.set must be less than "0.5", so that more than 50% of the database is used to adjust the model.

na.action

Characters provided or treatment used in NA values. If na.action is equal to exclude (default value), the row containing the NA will be excluded in all variables of the model. If na.action is equal to replace, the row containing the NA will be replaced by the average of the variable in all variables of the model.

mean_betas, variance_betas

vectors including a priori information of mean and variance for the estimated beta respectively, beta is the name given to the coefficient of each covariate that influences theta. PS: the size of the vectors must equal p + 1, p being the number of covariates for theta.

mean_gammas, variance_gammas

vectors including a priori information of mean and variance for the estimated ranges respectively, gamma is the name given to the coefficient of each covariate that influences zeta. PS: the size of the vectors must be equal to q + 1, q being the number of covariates for zeta.

iter

A positive integer specifying the number of iterations for each chain (including warmup). The default is 10000.

warmup

A positive integer specifying the number of iterations that will be in the warm-up period, will soon be discarded when making the estimates and inferences. Warmup must be less than iter and its default value is iter/2.

chains

A positive integer specifying the number of Markov chains. The default is 1.

pars

A vector of character strings specifying parameters of interest. The default is NULL indicating all parameters in the model.

a, b

Positive integer specifying the a priori information of the parameters of the gamma distribution for the zeta, if there are covariables explaining zeta a and b they will not be used.

resid.type

A character containing the residual type returned by the model among the possibilities. The type of residue can be quantile, sweighted, pearson or ordinary. The default is quantile.

...

Other optional parameters from RStan

Value

pmse return an object of class pmse_bayesbr containing the value of the prediction mean squared error and an object of the class bayesbr with the following items:

call: the original function call,
formula: the original formula,
y: the response proportion vector,
stancode: lines of code containing the .STAN file used to estimate the model,
info: a list containing model information such as the argument pars passed as argument, name of variables, number of: iterations, warmups, chains, covariables for theta, covariables for zeta and observations of the sample. In addition there is an element called samples, with the posterior distribution of the parameters of interest,
fitted.values: a vector containing the estimates for the values corresponding to the theta of each observation of the variable response, the estimate is made using the mean of the a prior theta distribution,
model: the full model frame,
residuals: a vector of residuals
residuals.type: the type of returned residual,
loglik: log-likelihood of the fitted model(using the mean of the parameters in the posterior distribution),
BIC: a value containing the Bayesian Information Criterion (BIC) of the fitted model,
pseudo.r.squared: pseudo-value of the square R (correlation to the square of the linear predictor and the a posteriori means of theta).

References

10.1080/0266476042000214501 Ferrari, S.L.P., and Cribari-Neto, F. (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799--815.

10.1016/j.jeconom.2005.07.014 Clark, T. E., & West, K. D. (2006). Using out-of-sample mean squared prediction errors to test the martingale difference hypothesis. Journal of econometrics, 135(1-2), 155-186.

Examples

Run this code

# NOT RUN {
data("bodyfat",package="bayesbr")
# }
# NOT RUN {
bbr = pmse(siri ~ age + weight| biceps + forearm, data = bodyfat,
             test.set = 0.25, iter = 100)

pmse = bbr$PMSE
model = bbr$model
summary(model)
residuals(model,type="sweighted")
# }
# NOT RUN {
bbr2 = pmse(siri ~ age + weight + height +
              wrist | biceps + forearm, data = bodyfat,
              test.set = 0.4, iter = 1000,
              mean_betas = 3,variance_betas = 10)
pmse2 = bbr2$PMSE
model2 = bbr2$model
residuals(model2,type="sweighted")
# }

Run the code above in your browser using DataLab