shapereg: Shape-Restricted Regression

Description

The regression model $y_i = f(t_i) + x_i'\beta + \epsilon_i, i = 1,\ldots,n$ is considered, where the only assumptions about $f$ concern its shape. The vector expression for the model is $y = \theta + X\beta + \epsilon$. $X$ represents a categorical covariate. The shapereg function allows eight shapes: increasing, decreasing, convex, concave, increasing-convex, increasing-concave, decreasing-convex, and decreasing-concave. This routine employs a single cone projection to find $\theta$ and $\beta$ simultaneously.

Usage

shapereg(y, t, shape, xmat = NULL, w = NULL, test = FALSE)

Arguments

A vector of length $n$.

A continuous and constrained predictor of length $n$.

shape

A value ranging from 1 to 8 indicating the shape of $f$: 1 = increasing; 2 = decreasing; 3 = convex; 4 = concave; 5 = increasing convex; 6 = decreasing convex; 7 = increasing concave; 8 = decreasing concave.

xmat

A $n$ by $k$ full colunm rank matrix. If xmat is given as a vector, it will be transformed into a $n$ by $1$ matrix. It represents a categorical covariate. The user can choose to put a constant vector in xmat or not. If xmat is not given, xmat is created

An optional nonnegative vector of weights of length $n$. If w is not given, all weights are taken to equal 1. Otherwise, the minimization of $(y - \phi)'w(y - \phi)$ over $C$ is returned. The default is w = NULL.

test

A logical scalar. If test == TRUE, then the p-value for the hypothesis test $H_0: \phi$ is in $V$ versus $H_1: \phi$ is in $C$ is returned. $C$ is the constraint cone, and $V$ is the linear space contained in the constraint cone.

Value

pvalThe p-value for the hypothesis test $H_0: \phi$ is in $V$ versus $H_1: \phi$ is in $C$. $C$ is the constraint cone of the form ${\phi: \phi = v + \sum b_i\delta_i, i = 1,\ldots,m, b_1,\ldots, b_m \ge 0 }$, $v$ is in $V$, and $V$ is the linear space contained in the constraint cone. If test == TRUE, a p-value is returned. Otherwise, the test is skipped and no p-value is returned.
coefsThe estimated coefficients for $X$, i.e., the estimation for the vector $\beta$. Note that even if the user does not provide a constant vector in $X$, the coefficient for the intercept will be returned.
constr.fitThe shape-restricted fit over the constraint cone $C$ of the form ${\phi: \phi = v + \sum b_i\delta_i, i = 1,\ldots,m, b_1,\ldots, b_m \ge 0 }$, $v$ is in $V$.
linear.fitThe least-squares regression of $y$ on $V$, i.e., the linear space contained in the constraint cone. If shape is 3 or shape is 4, $V$ is spanned by $X$ and $t$. Otherwise, it is spanned by $X$. $X$ must be full column rank, and the matrix formed by combining $X$ and $t$ must also be full column rank.
se.betaThe standard errors for the estimation of the vector $\beta$. The degree of freedom is returned by coneB and is multiplied by 1.5. Note that even if the user does not provide a constant vector in $X$, the standard error for the intercept will be returned.
pvals.betaThe approximate p-values for the estimation of the vector $\beta$. A t-distribution is used as the approximate distribution. Note that even if the user does not provide a constant vector in $X$, the approximate p-value for the intercept will be returned.
shapeThe shape given by the user.
testThe test parameter given by the user.
SSE0The sum of squared residuals for the linear part.
SSE1The sum of squared residuals for the full model.
callThe matched call

Details

This routine constrains $\theta$ in the equation $y = \theta + X\beta + \epsilon$ by a shape parameter. The constraint cone $C$ has the form ${\phi: \phi = v + \sum b_i\delta_i, i = 1,\ldots,m, b_1,\ldots, b_m \ge 0 }$, $v$ is in $V$. The column vectors of $X$ are in $V$, i.e., the linear space contained in the constraint cone. The hypothesis test $H_0: \phi$ is in $V$ versus $H_1: \phi$ is in $C$ is an exact one-sided test, and the test statistic is $E_{01} = (SSE_0 - SSE_1)/(SSE_0)$, which has a mixture-of-betas distribution when $H_0$ is true and $\epsilon$ is a vector following a standard multivariate normal distribution with mean 0. The mixing parameters are found through simulations. The number of simulations used to obtain the mixing distribution parameters for the test is 10,000. Such simulations usually take some time. For the "feet" data set used as an example in this section, whose sample size is 39, the time to get a p-value is roughly between 4 seconds. This routine calls coneB for the cone projection part.

References

Raubertas, R. F., C.-I. C. Lee, and E. V. Nordheim (1986) Hypothesis tests for normals means constrained by linear inequalities. Communications in Statistics - Theory and Methods 15 (9), 2809--2833. Robertson, T., F. Wright, and R. Dykstra (1988) Order Restricted Statistical Inference New York: John Wiley and Sons. Fraser, D. A. S. and H. Massam (1989) A mixed primal-dual bases algorithm for regression under inequality constraints application to concave regression. Scandinavian Journal of Statistics 16, 65--74. Meyer, M. C. (2003) A test for linear vs convex regression function using shape-restricted regression. Biometrika 90 (1), 223--232. Cheng, G.(2009) Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference 139, 1980--1991. Meyer, M.C.(2013a) Semiparametric additive constrained regression. Journal of Nonparametric Statistics to appear.

Examples

Run this code

#load the feet data set
    data(feet)

#extract the continuous and constrained predictor
    l <- feet$length

#extract the continuous response
    w <- feet$width

#extract the categorical covariate
    s <- feet$sex

#create the dummy variable indicating sex: if sex = "G", x = 0; if sex = "B", x = 1 
    n <- length(s)
    x <- rep(0, n)
    for(i in 1:n){if(s[i] == "G") x[i] = 0 else x[i] = 1}

#create the xmat in the shapereg function
    xmat <- x

#make an increasing fit with test set as FALSE
    shape <- 1
    ans <- shapereg(w, l, shape, xmat)

#check the summary table 
    summary(ans)

#make an increasing fit with test set as TRUE
    ans <- shapereg(w, l, shape, xmat, test = TRUE)

#check the summary table 
    summary(ans)

#make a plot comparing the unconstrained fit and the constrained fit
    par(mar = c(4, 4, 1, 1))
    ord <- order(l)
    plot(sort(l), w[ord], type = "n", xlab = "foot length (cm)", ylab = "foot width (cm)")
    title("Shapereg Example Plot")

#sort l according to sex
    ord1 <- order(l[s == "G"])
    ord2 <- order(l[s == "B"])

#make the scatterplot of l vs w for boys and girls
    points(sort(l[s == "G"]), w[s == "G"][ord1], pch = 21, col = 1)
    points(sort(l[s == "B"]), w[s == "B"][ord2], pch = 24, col = 2)

#make an unconstrained fit to boys and girls
    fit <- lm(w ~ l + factor(s))

#plot the unconstrained fit 
    lines(sort(l), (coef(fit)[1] + coef(fit)[2] * l + coef(fit)[3])[ord], lty = 2)
    lines(sort(l), (coef(fit)[1] + coef(fit)[2] * l)[ord], lty = 2, col = 2)
    legend(21.5, 9.8, c("boy","girl"), pch = c(24, 21), col = c(2, 1)) 

#plot the constrained fit
    lines(sort(l), (ans$constr.fit - ans$linear.fit + ans$coefs[1])[ord], col = 1)
    lines(sort(l), (ans$constr.fit - ans$linear.fit + ans$coefs[1] + ans$coefs[2])[ord], col = 2)

Run the code above in your browser using DataLab