simplexreg: Simplex Generalized Linear Model Regression Function

Description

Regression Analysis of Proportional Data Using Various Types of Simplex Models

Usage

simplexreg(formula, data, subset, na.action,  	link = c("logit", "probit", "cloglog", "neglog"), corr = "Ind", id = NULL,  	control = simplexreg.control(...), model = TRUE, y = TRUE, x = TRUE, ...)
	
simplexreg.fit(y, x, z = NULL, t = NULL, link = "logit", corr = "Ind", 	id = NULL, control = simplexreg.control())

Arguments

formula

a symbolic description of the model to be fitted(of type y ~ x or y ~ x | z | t. The Details are given under 'Details').

data

an optional data frame, list or environment containing variables in formula and id.

subset, na.action

arguments controlling formula processing via model.frame

link

type of link function to the mean. Currently, "logit"(logit function), "probit"(probit function), "cloglog"(complementary log-log function), "neglog"(negative log function) are supported.

corr

the covariance structure, chosen from "Ind"(independent structure), "Exc"(exchangeability) and "AR1"(AR(1)), see Details

a factor identifies the clusters when gee = TRUE. The length of id should be the same as the number of observations. y, x, z, t are assumed to be sorted in accordance with clusters specified by id

control

a list of control argument via simplexreg.control

model

a logical value indicating whether model frame should be included as a component of the return value

y, x

For simplexreg:logical values indicating whether response vector and covariates modelling the mean parameter should be returned as components of the returned value For simplexreg.fit:x is the design matrix and y is the response vector

regressor matrix modelling the dispersion parameter

time covariate in the correlation structure, see Details

...

argument passed to simplexreg.control

Value

Details

Outcomes of continuous proportions arise in many applied areas. Such data could be properly modelled using simplex regression. See also simplex. The mean and dispersion parameters are linked to set of regressors. Regression analysis of the simplex model is implemented in simplexreg. If corr = "Ind", simplex generalized regression model is employed. Estimations is performed by maximum likelihood via Fisher scoring technique.

Apart from including generalized simplex regression models, this function also provides users with generalized estimating equations (GEE) techniques to model longitudinal proportional response. Exchangeability and AR(1) structures are available. Parameter estimation and residual analysis are involved.

We employ the specification approach designed in the fitting model function betareg of beta regression in the package betareg. As for simplex regression models, assuming the dispersion is homogeneous, the response is linked to a linear predictor described by y ~ x1 + x2 using a link function. Four types of function are available linking the regressors to the mean. However, for dispersion, the link function is restricted to logarithm function. When modeling dispersion, the regressor modelling the dispersion parameter should be specified in a formula form of type y ~ x1 + x2 | z1 + z2 where z1 and z2 are linked to the dispersion parameter $\sigma^2$.

Model specification is a bit complicated when it comes to modelling longitudinal proportional response. Song et. al (2004) proposed a marginal simplex model consists of three components, the population-average effects, the pattern of dispersion and the correlation structure. Let the percentage responses for the $i$th subject be $y_{ij}$, observed at time $t_{ij}$. If corr = "AR1", the working covariance matrix of $y_{ij}$, $j = 1, 2, ..., n_i$, is $${\exp(\alpha * |t_{ik} - t_{ij}|)}_{kj}$$ where $\alpha < 0$ and $exp(\alpha)$ is the lag-1 autocorrelation. If corr = "Exc", the covariance matrix will be $(1 - exp(\alpha)) I + exp(\alpha) 1$ where I is the identity matrix while 1 the matrix with all elements being equal to one.

For homogeneous dispersion, the formula is supposed to be of the form y ~ x1 + x2 | 1 | t where $t$ is the time covariate. Otherwise, the formula will be of the form y ~ x1 + x2 | z1 + z2 | t.

References

Barndorff-Nielsen, O.E. and Jorgensen, B. (1991) Some parametric models on the simplex. Journal of Multivariate Analysis, 39: 106--116 Jorgensen, B. (1997) The Theory of Dispersion Models. London: Chapman and Hall McCullagh, P and Nelder J. (1989) Generalized Linear Models. London: Chapman and Hall Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540--553 Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1--21

Examples

Run this code

# GLM models
data("sdac", package = "simplexreg")
sim.glm1 <- simplexreg(rcd~ageadj+chemo, link = "logit", 
  data = sdac)
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, link = "logit", 
  data = sdac)

# GEE models
data("retinal", package = "simplexreg")
sim.gee1 <- simplexreg(Gas~LogT+LogT2+Level|1|Time, link = "logit", 
  corr = "Exc", id = ID, data = retinal)
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time, 
  link = "logit", corr = "AR1", id = ID, data = retinal)

Run the code above in your browser using DataLab