scair: Maximizing the likelihood of the generalised additive index model with shape constraints

Description

This function searches for a maximum likelihood estimator (mle) of the generalised additive index regression with shape constraints. A stochastic search strategy is used here. Each index is a linear combination of some (or all) the covariates. Each additive component function of these index predictors is assumed to belong to one of the nine possible shape restrictions.

The output is an object of class scair which contains all the information needed to plot the estimator using the plot method, or to evaluate it using the predict method.

Usage

scair(x,y,shape=rep("l",1), family=gaussian(), weights=rep(1,length(y)), 
  epsilon=1e-8, delta=0.1, indexgen=c("unif", "norm"), iter = 200, 
  allnonneg = FALSE)

Arguments

Observed covariates in $R^d$, in the form of an $n \times d$ numeric matrix.

Observed responses, in the form of a numeric vector of length $n$.

shape

A vector that specifies the shape restrictions for additive component function of each index, in the form of a string vector of length $m$. Here for the sake of identifiability, we require the number of indices $m \le d$. The shape constrai

family

A description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Currently only the f

weights

An optional vector of prior weights to be used when maximising the likelihood. It is a numeric vector of length $n$. By default equal weights are used.

epsilon

Positive convergence tolerance epsilon when performing the iteratively reweighted least squares (IRLS) method at each iteration of the active set algorithm in scar. See scar

delta

A tuning parameter used to avoid the perfect fit phenomenon, and to ensure identifiability. It represents the lower bound of the minimum eigenvalue of all possible $A^T A$ subject to identiability conditions, where $A$ is an index matrix. It s

indexgen

It determines how the index matrices are generated in the stochastic search. If its value is "unif", then entries of the index matrices are drawn from uniform distribution; otherwise, if its value is "norm", entries are

iter

Number of iterations of the stochastic search.

allnonneg

A boolean variable that specifies whether all the entries of the index matrices are non-negative. If it is true, then delta is no longer needed.

Value

An object of class scair, with the following components:
xCovariates copied from input.
yResponse copied from input.
shapeShape vector copied from input.
weightsVector of weights copied from input.
familyThe exponential family copied from input.
componentfitValue of the fitted component function at each observed index (computed using the estimated index matrix), in the form of an $n \times m$ numeric matrix, where the element at the $i$-th row and the $j$-th column is the value of $f_j$ at the $j$-th coordinate of $A^T X_i$, with the identifiability condition satisfied (see details of scar).
constantThe estimated value of the constant $c$ in the additive function $f$ (see details of scar)).
devianceUp to a constant, minus twice the maximised log-likelihood. Where applicable, the constant is chosen to make the saturated model to have zero deviance. See also glm.
nulldevianceThe deviance for the null model.
deltaA parameter copied from input.
iterTotal number of iterations of the stochastic search algorithm
allnonnegspecifies whether all entris of the index matrix is non-negative, copied from input.

Details

For $i = 1,\ldots,n$, let $X_i$ be the $d$-dimensional covariates, $Y_i$ be the corresponding one-dimensional response and $w_i$ be its weight. The generalised additive index model can be written as $$g(\mu) = f(x),$$ where $x=(x_1,\ldots,x_d)^T$, $g$ is a known link function, $A$ is an $d \times m$ index matrix, and $f$ is an additive function. Our task is to estimate both the index matrix and the additive function. Assume the canonical link function is used here, then the maximum likelihood estimator of the generalised additive index model based on observations $(X_1,Y_1), \ldots, (X_n,Y_n)$ is the function that maximises $$\frac{1}{n} \sum_{i=1}^n w_i {Y_i f(A^T X_i) - B(f(A^T X_i))}$$ subject to the restrictions that for every $j = 1,\ldots,m$, the $j$-th additive component of $f$ satisfies the constraint indicated by the $j$-th element of shape. Here $B(.)$ is the log-partition function of the specified exponential family distribution, and $w_i$ are the weights. For i.i.d. data, $w_i$ should be $1$ for each $i$. For any given $A$, the optimization problem can solved using the active set algorithm implemented in scar. Therefore, this problem can be reduced to a finite-dimensional optimisation problem. Here we apply a simple stochastic search strategy is proposed, though other methods, such as downhill simplex, is also possible (and sometimes offers competitive performance).

References

See the references listed for scar.

Examples

Run this code

## An example in the Gaussian additive index regression setting:
## Define the additive function f on the scale of the predictors
f<-function(x){
  return((0.5*x[,1]+0.25*x[,2]-0.25*x[,3])^2) 
}

## Simulate the covariates and the responses
## covariates are drawn uniformly from [-1,1]^3
set.seed(10)
d = 3
n = 500
x = matrix(runif(n*d)*2-1,nrow=n,ncol=d) 
y = f(x) + rnorm(n,sd=0.5)

## Single index model so no delta is required here
shape=c("cvx")
object = scair(x,y,shape=shape, family=gaussian(),iter = 100)

## Estimated index matrix
object$index

## Root squared error for the estimated index
sqrt(sum((object$index - c(0.5,0.25,-0.25))^2))

## Plot the estimatied additive function for the single index
plot(object)

## Evaluate the estimated prediction function at 10^4 random points 
## drawing from the interior of the support
testx = matrix((runif(10000*d)*1.96-0.98),ncol=d)
testf = predict(object,testx)

## and calculate the (estimated) absolute prediction error
mean(abs(testf-f(testx))) 

## Here we can treat the obtained index matrix as a warm start and perform 
## further optimization (on the second and third entry of the index)
## using e.g. the default R optimisation routine.
fn<-function(w){
    dev = Inf
    if (abs(w[1])+abs(w[2])>1) return(dev)
    else {
      wnew = matrix(c(1-abs(w[1])-abs(w[2]),w[1],w[2]),ncol=1)
      dev = scar(x %*% wnew, y, shape = "cvx")$deviance
      return (dev)
    } 
}
index23 = optim(object$index[2:3],fn)$par
newindex = matrix(c(1-sum(abs(index23)),index23),ncol=1); newindex

## Root squared error for the new estimated index
sqrt(sum((newindex - c(0.5,0.25,-0.25))^2))

## A further example is provided in decathlon dataset

Run the code above in your browser using DataLab