ssym.l: Fitting Semi-parametric Log-symmetric Regression Models

Description

ssym.l is used to fit a semi-parametric regression model suitable for analysis of data sets in which the response variable is continuous, strictly positive, and asymmetric. In this setup, both median and skewness of the response variable distribution are explicitly modeled through semi-parametric functions, whose nonparametric components may be approximated by natural cubic splines or P-splines.

Usage

ssym.l(formula, family, xi, data, epsilon, maxiter, subset, local.influence)

Arguments

formula

a symbolic description of the systematic component of the model to be fitted. See details for further information.

family

a description of the (log) error distribution to be used in the model. Supported families include Normal, Student, Contnormal, Powerexp, Hyperbolic, Slash, Sinh-normal and

a numeric value or numeric vector that represents the extra parameter of the specified error distribution.

data

an optional data frame, list or environment containing the variables in the model.

epsilon

an optional positive value, which represents the convergence criterion. Default value is 1e-07.

maxiter

an optional positive integer giving the maximal number of iterations for the estimating process. Default value is 1e03.

subset

an optional expression specifying a subset of individuals to be used in the fitting process.

local.influence

logical. If TRUE, local influence measures under two perturbation schemes are calculated.

Value

coefs.mua vector of parameter estimates associated with the median submodel.
coefs.phia vector of parameter estimates associated with the skewness submodel.
vcov.muapproximate variance-covariance matrix associated with the median submodel.
vcov.phiapproximate variance-covariance matrix associated with the skewness submodel.
weightsfinal weights of the iterative process.
lambda.muestimate for the smoothing parameter associated with the nonparametric part of the median submodel.
dfe.mudegrees of freedom associated with the nonparametric part of the median submodel.
lambda.phiestimate for the smoothing parameter associated with the nonparametric part of the skewness submodel.
dfe.phidegrees of freedom associated with the nonparametric part of the skewness submodel.
deviance.mua vector of deviances associated with the median submodel.
deviance.phia vector of deviances associated with the skewness submodel.
mu.fitteda vector of fitted values for the (log) median submodel.
phi.fitteda vector of fitted values for the skewness submodel.
lpdfa vector of individual contributions to the log-likelihood function.
cwif local.influence=TRUE, a matrix of local influence and total local influence measures (under the case-weight perturbation scheme) associated with the median submodel.
prif local.influence=TRUE, a matrix of local influence and total local influence measures (under the response perturbation scheme) associated with the median submodel.
cw.thetaif local.influence=TRUE, a matrix of local influence and total local influence measures (under the case-weight perturbation scheme).
pr.thetaif local.influence=TRUE, a matrix of local influence and total local influence measures (under the response perturbation scheme).

Details

The argument formula comprises of three parts (separated by the symbols "~" and "|"), namely: observed response variable in log-scale, predictor of the median submodel (having logarithmic link) and predictor of the skewness submodel (having logarithmic link). A non-parametric effect may be specified in the predictors, either approximated by a natural cubic spline or a P-spline using the functions ncs() or psp(), respectively. The iterative estimation process is based on the Fisher scoring and backfitting algorithms. Because some distributions such as log-Student-t, log-contaminated-normal, log-power-exponential, log-slash and log-hyperbolic may be obtained as a power mixture of the log-normal distribution, the expectation-maximization (EM) algorithm is applied in those cases to obtain a more efficient iterative process for the parameter estimation. Furthermore, because the Birnbaum-Saunders-t distribution can be obtained as a scale mixture of the Birnbaum-Saunders distribution, the expectation-maximization algorithm is also applied in this case to obtain a more efficient iterative process for the parameter estimation. The smoothing parameter(s) is(are) chosen using the unweighted cross-validation score. The function ssym.l() calculates deviance-type residuals for both submodels as well as local influence measures under case-weight and response perturbation schemes.

References

Vanegas, L.H. and Paula, G.A. (2015a) A Semiparametric Approach for Joint Modeling of Median and Skewness. TEST (to appear) Vanegas, L.H. and Paula, G.A. (2015b) Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics (to appear)

Examples

Run this code

###################################################################################
######### Fraction of Cell Volume Data - a log-power-exponential model  ###########
###################################################################################

data("Ovocytes", package="ssym")
fit <- ssym.l(log(fraction) ~ type + psp(time) | type + psp(time), data=Ovocytes,
              family='Powerexp', xi=-0.55, maxiter=5000, local.influence=TRUE)
summary(fit)

################## Graph of the nonparametric effects ##################

par(mfrow=c(1,2))
np.graph(fit, which=1, exp=TRUE)
np.graph(fit, which=2, exp=TRUE)

################## Graph of deviance-type residuals ##################

plot(fit)

################## Graph of local influence measures ##################

ilm <- influence.ssym(fit)

###################################################################################
############### Textures of snacks Data - a log-Student-t model  #################
###################################################################################

data("Snacks", package="ssym")
fit <- ssym.l(log(texture) ~ type + ncs(week) | type, data=Snacks,
              family='Student', xi=15)
summary(fit)

################## Graph of the nonparametric effect ##################

np.graph(fit, which=1, exp=TRUE)

################## Graph of deviance-type residuals ##################

plot(fit)

###################################################################################
####################### gam.data - a Power-exponential model   ####################
###################################################################################

data("gam.data", package="gam")

fit <- ssym.l(y~psp(x),data=gam.data,family="Powerexp",xi=-0.5)
summary(fit)

################## Graph of the nonparametric effect ##################

np.graph(fit, which=1)

################## Graph of deviance-type residuals ##################

plot(fit)

###################################################################################
######### Personal Injury Insurance Data - a Birnbaum-Saunders-t model   ##########
###################################################################################

data("Claims", package="ssym")
fit <- ssym.l(log(total) ~ op_time | op_time, data=Claims,
              family='Sinh-t', xi=c(0.1,4))
summary(fit)

################## Plot of deviance-type residuals ##################

plot(fit)

###################################################################################
######### Body Fat Percentage Data - a Birnbaum-Saunders-t model   ##########
###################################################################################

data("ais", package="sn")
fit <- ssym.l(log(Bfat)~1, data=ais, family='Sinh-t', xi=c(4.5,4))
summary(fit)

id <- sort(ais$Bfat, index=TRUE)$ix
par(mfrow=c(1,2))
hist(ais$Bfat[id],xlim=range(ais$Bfat),ylim=c(0,0.1),prob=TRUE,breaks=15,
     col="light gray",border="dark gray",xlab="",ylab="",main="")
par(new=TRUE)
plot(ais$Bfat[id],exp(fit$lpdf[id])/ais$Bfat[id],xlim=range(ais$Bfat),
     ylim=c(0,0.1),type="l",xlab="",ylab="Density",main="Histogram")
	 
plot(ais$Bfat[id],fit$cdfz[id],xlim=range(ais$Bfat),ylim=c(0,1),type="l",
     xlab="",ylab="",main="")
par(new=TRUE)
plot(ecdf(ais$Bfat[id]),xlim=range(ais$Bfat),ylim=c(0,1),verticals=TRUE,
     do.points=FALSE,col="dark gray",ylab="Probability.",xlab="",main="ECDF")

###################################################################################
################### Boston Housing Data - a log-Slash model   ####################
###################################################################################
	 
#data("Boston", package="MASS")
#fit <- ssym.l(log(medv)~psp(lstat)|psp(lstat),data=Boston,family="Slash",xi=1.7)
#summary(fit)
#plot(fit)

Run the code above in your browser using DataLab