ssym.nl: Fitting Semi-parametric Symmetric Regression Models

Description

ssym.nl is used to fit a semi-parametric regression model suitable for data set analysis in which the conditional distribution of the response is symmetric and continuous. In this setup, both location and dispersion parameters of the response variable distribution are explicitly modeled, the location using a nonlinear function and the dispersion using a semi-parametric function, which may be approximated by a natural cubic spline or a P-spline.

Usage

ssym.nl(formula, start, family, xi, data, epsilon, maxiter, subset, local.influence)

Arguments

formula

a symbolic description of the systematic component of the model to be fitted. See details for further information.

start

a named numeric vector of starting estimates for the parameters in the specified nonlinear function.

family

a description of the error distribution to be used in the model. Supported families include Normal, Student, Contnormal, Powerexp, Hyperbolic, Slash, Sinh-normal and Sinh-<

a numeric value or numeric vector that represents the extra parameter of the specified error distribution.

data

an optional data frame, list or environment containing the variables in the model.

epsilon

an optional positive value, which represents the convergence criterion. Default value is 1e-07.

maxiter

an optional positive integer giving the maximal number of iterations for the estimating process. Default value is 1e03.

subset

an optional expression specifying a subset of individuals to be used in the fitting process.

local.influence

logical. If TRUE, local influence measures under two perturbation schemes are calculated.

Value

coefs.mua vector of parameter estimates associated with the nonlinear function fitted to the location of the response (or log-response) variable.
coefs.phia vector of parameter estimates associated with the semiparametric function fitted to the dispersion of the response (or log-response) variable.
se.muapproximated standard errors asociated with coefs.mu.
se.phiapproximated standard errors asociated with coefs.phi.
weightsfinal weights of the iterative process.
lambda.muestimate for the smoothing parameter associated to the location parameter.
dfe.mudegrees of freedom associated with the nonparametric part of the location submodel.
lambda.phiestimate for the smoothing parameter associated to the dispersion parameter.
dfe.phidegrees of freedom associated with the nonparametric part of the dispersion submodel.
deviance.mua vector of deviances associated with the location of the response (or log-response) variable.
deviance.phia vector of deviances associated with the dispersion of the response (or log-response) variable.
mu.fitteda vector of fitted values for the location of the response (or log-response) variable.
phi.fitteda vector of fitted values for the dispersion of the response (or log-response) variable.
lpdfa vector of individual contributions to the log-likelihood function.
cdfza vector of the cumulative distribution function of each individual.
cwif local.influence=TRUE, a matrix of local influence and total local influence measures (under the case-weight perturbation scheme) associated with coefs.mu and based on the conformal normal curvature.
prif local.influence=TRUE, a matrix of local influence and total local influence measures (under the response perturbation scheme) associated with coefs.mu and based on the conformal normal curvature.
cw.thetaif local.influence=TRUE, a matrix of local influence and total local influence measures (under the case-weight perturbation scheme) asociated with the (entire) vector of parameters and based on the conformal normal curvature.
pr.thetaif local.influence=TRUE, a matrix of local influence and total local influence measures (under the response perturbation scheme) asociated with the (entire) vector of parameters and based on the conformal normal curvature.

Details

The argument formula comprises three parts (separated by the symbols "~" and "|"), namely: observed response variable, expression of the nonlinear function representing the location submodel, and predictor of the dispersion submodel (having logarithmic link). A non-parametric effect may be specified in the predictor of the dispersion submodel, either approximated by a natural cubic spline or a P-spline through the functions ncs() or psp(), respectively. The iterative estimation process is based on the Fisher scoring and backfitting algorithms. Because some distributions such as Student-t, contaminated normal, power exponential, slash and symmetric hyperbolic may be obtained as a scale mixture of normal distributions, the Expectation-Maximization algorithm is applied in those cases to obtain a more efficient iterative process for the parameter estimation. Furthermore, because the Sinh-t distribution can be obtained as a scale mixture of Sinh-normal distributions, the Expectation-Maximization algorithm is also applied in that case to obtain a more efficient iterative process for the parameter estimation. The smoothing parameter(s) is(are) chosen using the unweighted cross-validation score. The function ssym.nl() calculates deviance-type residuals for both submodels as well as local influence measures under case-weight and response perturbation schemes.

References

Vanegas, L.H. and Paula, G.A. (2014a) A Semiparametric Approach for Joint Modeling of Median and Skewness. TEST (accepted) Vanegas, L.H. and Paula, G.A. (2014b) Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics (submitted)

Examples

Run this code

###################################################################################
################## European rabbits Data - a log-Student-t model   #############
###################################################################################

data("Erabbits", package="ssym")
fit <- ssym.nl(log(wlens) ~ b1 - b2/(b3 + age) | ncs(age), start=c(b1=5.6,
               b2=128, b3=36.4), data=Erabbits, family='Student', xi=11,
			   local.influence=TRUE)
summary(fit)

################## Plot of the fitted model ##################

par(mfrow=c(1,2))                                                      
plot(Erabbits$age, Erabbits$wlens, xlim=range(Erabbits$age),
     ylim=range(Erabbits$wlens), ylab="", cex=0.3, lwd=3, xlab="age")                      
par(new=TRUE)
plot(Erabbits$age,exp(fitted(fit)$mu), xlim=range(Erabbits$age),
     ylim=range(Erabbits$wlens), ylab="", type="l", main="Median", xlab="")
np.graph(fit, which=2, exp=TRUE, xlab="age", main="Skewness")

################## Plot of deviance-type residuals ##################

plot(fit)

################## Plot of local influence measures ##################

ilm <- influence.ssym(fit)

###################################################################################
################## Biaxial Fatigue Data - a Birnbaum-Saunders model   #############
###################################################################################

data("Biaxial", package="ssym")
fit <- ssym.nl(log(Life) ~ b1*Work^b2, start=c(b1=16,b2=-0.2), data=Biaxial,
               family='Sinh-normal', xi=0.41, local.influence=TRUE)
summary(fit)

################## Plot of the fitted model ##################

attach(Biaxial)
xl <- "Work per cycle"
rx <-range(Biaxial$Work)
ry <- range(Biaxial$Life)
plot(Biaxial$Work, Biaxial$Life, xlim=rx, ylim=ry, type="p", cex=0.3, lwd=3,
     ylab="", xlab="")
par(new=TRUE)
ids <- sort(Biaxial$Work,index=TRUE)$ix
plot(Biaxial$Work[ids], exp(fitted(fit)$mu[ids]), xlim=rx, ylim=ry, type="l",
     ylab="Life", xlab=xl)

################## Plot of deviance-type residuals ##################

plot(fit)

################## Plot of local influence measures ##################

ilm <- influence.ssym(fit)

###################################################################################
######### Gross Domestic Product per capita Data - a Birnbaum-Saunders model ######
###################################################################################

data("gdp", package="ssym")
fit <- ssym.nl(log(gdp2010) ~ b1, start=c(b1=mean(log(gdp$gdp2010))), data=gdp, 
               family='Sinh-normal', xi=2.2)
summary(fit)

################## Plot of the fitted model ##################

xl <- "GDP per capita 2010"
par(mfrow=c(1,2))
hist(gdp$gdp2010, xlim=range(gdp$gdp2010), ylim=c(0,0.00015), prob=TRUE, breaks=55,
     col="light gray", border="dark gray",xlab="",main="",ylab="")
par(new=TRUE)
plot(gdp$gdp2010, exp(fit$lpdf)/gdp$gdp2010, xlim=range(gdp$gdp2010), ylim=c(0,0.00015),
     type="l", xlab=xl, ylab="", main="Histogram")

plot(gdp$gdp2010, fit$cdfz, xlim=range(gdp$gdp2010), ylim=c(0,1), type="l", xlab="",
     ylab="")
par(new=TRUE)
plot(ecdf(gdp$gdp2010), xlim=range(gdp$gdp2010), ylim=c(0,1), verticals=TRUE, do.points=FALSE,
     col="dark gray", xlab=xl, main="Empirical Cumulative Distribution Function")

################## Plot of deviance-type residuals ##################

plot(fit)

Run the code above in your browser using DataLab