This is a set of function useful for selecting appropriate models.
The functions gamlssVGD
, VGD
, getTGD
, TGD
can be used when a subset of the data is used for validation or testing.
The function stepVGD()
is a stepwise procedure for selecting an appropriate model for any of the parameters of the model minimising the test global deviance. The function stepVGDAll.A()
can select a model using strategy A for all the parameters.
The functions gamlssCV
, CV
can be used for a k-fold cross validation.
gamlssVGD(formula = NULL, sigma.formula = ~1, nu.formula = ~1,
tau.formula = ~1, data = NULL, family = NO,
control = gamlss.control(trace = FALSE),
rand = NULL, newdata = NULL, ...)
VGD(object, ...) getTGD(object, newdata = NULL, ...)
TGD(object, ...)
gamlssCV(formula = NULL, sigma.formula = ~1, nu.formula = ~1,
tau.formula = ~1, data = NULL, family = NO,
control = gamlss.control(trace = FALSE),
K.fold = 10, set.seed = 123, rand = NULL,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
CV(object, ...)
drop1TGD(object, scope, newdata, parameter = c("mu", "sigma", "nu", "tau"),
sorted = FALSE, trace = FALSE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
add1TGD(object, scope, newdata, parameter = c("mu", "sigma", "nu", "tau"),
sorted = FALSE, trace = FALSE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
stepTGD(object, scope, newdata,
direction = c("both", "backward", "forward"),
trace = TRUE, keep = NULL, steps = 1000,
parameter = c("mu", "sigma", "nu", "tau"),
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
stepTGDAll.A(object, scope = NULL, newdata = NULL,
steps = 1000, sigma.scope = NULL, nu.scope = NULL,
tau.scope = NULL, mu.try = TRUE, sigma.try = TRUE,
nu.try = TRUE, tau.try = TRUE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
A gamlss
mu
formula.
Formula for sigma
.
Formula for nu
.
Formula for tau
.
The data frame required for the fit.
The gamlss.family
distribution.
The control for fitting the gamlss model.
For gamlssVGD
a variable with values 1 (for fitting) and 2 (for predicting). For gamlssCV
a variable with k values indicating the cross validation sets.
The new data set (validation or test) for prediction.
A relevant R object.
defines the range of models examined in the stepwise selection similar to stepGAIC()
where you can see examples
defines the range of models examined in the stepwise selection for sigma
defines the range of models examined in the stepwise selection for nu
defines the range of models examined in the stepwise selection for tau
whether should try fitting models for mu
whether should try fitting models for sigma
whether should try fitting models for nu
whether should try fitting models for tau
which distribution parameter is required, default what="mu"
should the results be sorted on the value of TGD
f TRUE
additional information may be given on the fits as they are tried.
The mode of stepwise search, can be one of both
, backward
, or forward
, with a default of both
. If the scope argument is missing the default for direction is backward
see stepGAIC()
for explanation
the maximum number of steps to be considered. The default is 1000.
the number of subsets of the data used
the seed to be used in creating rand
The type of parallel operation to be used (if any). If missing, the default is "no".
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.
An optional parallel or snow cluster for use if parallel = "snow"
. If not supplied, a cluster on the local machine is created for the duration of the call.
further arguments to be pass in the gamlss fit
A fitted models of a set of global deviances.
The function gamlssVGD()
fits a gamlss model to the training data set determined by the arguments rand
or newdata
. The results is a gamlssVGD
objects which contains the gamlss fit to the training data plus three extra components: i) VGD
the global deviance applied to the validation data sets. ii) predictError
which is VGD
divided with the number of observations in the validation data set and iii) residVal
the residuals for the validation data set.
The function VGD()
extract the validated global deviance from one or more fitted gamlssVGD
objects and can be used foe model comparison.
The function getTGD()
operates different from the function gamlssVGD()
. It assumes that the users already have fitted models using gamlss()
and now he/she wants to evaluate the global deviance at a new (validation or test) data set.
The function TGD()
extract the validated/test global deviance from one or more fitted gamlssTGD
objects and can be use to compare models.
The gamlssCV()
performs a k-fold cross validation on a gamlss models.
The function CV()
extract the cross validated global deviance from one or more fitted gamlssCV
objects and can be use to compare models.
The functions add1TGD()
, drop1TGD()
and stepTGD
behave similar to add1()
, drop1()
and stepGAIC()
functions respectively but they used validation or test deviance as the selection criterion rather than the GAIC.
Chambers, J. M. and Hastie, T. J. (1991). Statistical Models in S, Chapman and Hall, London.
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.
(see also https://www.gamlss.com/).
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
# NOT RUN {
data(abdom)
# generate the random split of the data
rand <- sample(2, 610, replace=TRUE, prob=c(0.6,0.4))
# the proportions in the sample
table(rand)/610
olddata<-abdom[rand==1,] # training data
newdata<-abdom[rand==2,] # validation data
#------------------------------------------------------------------------------
# gamlssVGD
#-------------------------------------------------------------------------------
# Using rand
v1 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=NO,
rand=rand)
v2 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=LO,
rand=rand)
v3 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=TF,
rand=rand)
VGD(v1,v2,v3)
#-------------------------------------------------------------------------------
# }
# NOT RUN {
#-------------------------------------------------------------------------------
# using two data set
v11 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=NO, newdata=newdata)
v12 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=LO, newdata=newdata)
v13 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=TF, newdata=newdata)
VGD(v11,v12,v13)
#-------------------------------------------------------------------------------
# function getTGD
#-------------------------------------------------------------------------------
# fit gamlss models first
g1 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=NO)
g2 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=LO)
g3 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=TF)
# and then use
gg1 <-getTGD(g1, newdata=newdata)
gg2 <-getTGD(g2, newdata=newdata)
gg3 <-getTGD(g3, newdata=newdata)
TGD(gg1,gg2,gg3)
#-------------------------------------------------------------------------------
#-------------------------------------------------------------------------------
# function gamlssCV
#-------------------------------------------------------------------------------
set.seed(123)
rand1 <- sample (10 , 610, replace=TRUE)
g1 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=NO,
rand=rand1)
g2 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=LO,
rand=rand1)
g3 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=TF,
rand=rand1)
CV(g1,g2,g3)
CV(g1)
# using parallel
set.seed(123)
rand1 <- sample (10 , 610, replace=TRUE)
nC <- detectCores()
system.time(g21 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=NO, rand=rand1,parallel = "no", ncpus = nC ))
system.time(g22 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=LO, rand=rand1,parallel = "multicore", ncpus = nC ))
system.time(g23 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=TF, rand=rand1,parallel = "snow", ncpus = nC ))
CV(g21,g22,g23)
#-------------------------------------------------------------------------------
# functions add1TGD() drop1TGD() and stepTGD()
#-------------------------------------------------------------------------------
# the data
data(rent)
rand <- sample(2, dim(rent)[1], replace=TRUE, prob=c(0.6,0.4))
# the proportions in the sample
table(rand)/dim(rent)[1]
oldrent<-rent[rand==1,] # training set
newrent<-rent[rand==2,] # validation set
# null model
v0 <- gamlss(R~1, data=oldrent, family=GA)
# complete model
v1 <- gamlss(R~pb(Fl)+pb(A)+H+loc, sigma.fo=~pb(Fl)+pb(A)+H+loc,
data=oldrent, family=GA)
# drop1TGDP
system.time(v3<- drop1TGD(v1, newdata=newrent, parallel="no"))
system.time(v4<- drop1TGD(v1, newdata=newrent, parallel="multicore",
ncpus=nC) )
system.time(v5<- drop1TGD(v1, newdata=newrent, parallel="snow", ncpus=nC))
cbind(v3,v4,v5)
# add1TGDP
system.time(d3<- add1TGD(v0,scope=~pb(Fl)+pb(A)+H+loc, newdata=newrent,
parallel="no"))
system.time(d4<- add1TGD(v0,scope=~pb(Fl)+pb(A)+H+loc, newdata=newrent,
parallel="multicore", ncpus=nC) )
system.time(d5<- add1TGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="snow", ncpus=nC))
# stepTGD
system.time(d6<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent))
system.time(d7<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="multicore", ncpus=nC))
system.time(d8<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="snow", ncpus=nC))
# }
Run the code above in your browser using DataLab