polyclass: Polyclass: polychotomous regression and multiple classification

Description

Fit a polychotomous regression and multiple classification using linear splines and selected tensor products.

Usage

polyclass(data, cov, weight, penalty, maxdim, exclude, include,
additive = FALSE, linear, delete = 2, fit,  silent = TRUE, 
normweight = TRUE, tdata, tcov, tweight, cv, select, loss, seed)

Arguments

data

vector of classes: data should ranges over consecutive integers with 0 or 1 as the minimum value.

cov

covariates: matrix with as many rows as the length of data.

weight

optional vector of case-weights. Should have the same length as data.

penalty

the parameter to be used in the AIC criterion if the model selection is carried out by AIC. The program chooses the number of knots that minimizes -2 * loglikelihood + penalty * (dimension). The default is to use penalty = log(lengt

maxdim

maximum dimension (default is $\min(n, 4 * n^{1/3}*(cl-1)$, where $n$ is length(data) and $cl$ the number of classes.

exclude

combinations to be excluded - this should be a matrix with 2 columns - if for example exclude[1, 1] = 2 and exclude[1, 2] = 3 no interaction between covariate 2 and 3 is included. 0 represents time.

include

those combinations that can be included. Should have the same format as exclude. Only one of exclude and include can be specified .

additive

should the model selection be restricted to additive models?

linear

vector indicating for which of the variables no knots should be entered. For example, if linear = c(2, 3) no knots for either covariate 2 or 3 are entered. 0 represents time.

delete

should complete basis functions be deleted at once (2), should only individual dimensions be deleted (1) or should only the addition stage of the model selection be carried out (0)?

fit

polyclass object. If fit is specified, polyclass adds basis functions starting with those in fit.

silent

suppresses the printing of diagnostic output about basis functions added or deleted, Rao-statistics, Wald-statistics and log-likelihoods.

normweight

should the weights be normalized so that they average to one? This option has only an effect if the model is selected using AIC.

tdata,tcov,tweight

test set. Should satisfy the same requirements as data, cov and weight. If all test set weights are one, tweight can be omitted. If tdata and tcov are specified, the model sel

in how many subsets should the data be divided for cross-validation? If cv is specified and tdata is omitted, the model selection is carried out by cross-validation.

select

if a test set is provided, or if the model is selected using cross validation, should the model be select that minimizes (misclassification) loss (0), that maximizes test set log-likelihood (1) or that minimizes test set squared error loss (2)?

loss

a rectangular matrix specifying the loss function, whose size is the number of classes times number of actions. Used for cross-validation and test set model selection. loss[i, j] contains the loss for assigning action j to

seed

optional seed for the random number generator that determines the sequence of the cases for cross-validation. If the seed has length 12 or more, the first twelve elements are assumed to be .Random.seed, otherwise the function

Value

The output is an object of class polyclass, organized to serve as input for plot.polyclass, beta.polyclass, summary.polyclass, ppolyclass (fitted probabilities), cpolyclass (fitted classes) and rpolyclass (random classes). The function returns a list with the following members:
callthe command that was executed.
ncovnumber of covariates.
ndimnumber of dimensions of the fitted model.
nclassnumber of classes.
nbasnumber of basis functions.
nactionnumber of possible actions that are considered.
fctsmatrix of size nbas x (nclass + 4). each row is a basis function. First element: first covariate involved (NA = constant);
second element: which knot (NA means: constant or linear);
third element: second covariate involved (NA means: this is a function of one variable);
fourth element: knot involved (if the third element is NA, of no relevance);
fifth, sixth,... element: beta (coefficient) for class one, two, ...
knotsa matrix with ncov rows. Covariate i has row i+1, time has row 1. First column: number of knots in this dimension; other columns: the knots, appended with NAs to make it a matrix.
cvin how many sets was the data divided for cross-validation. Only provided if method = 2.
lossthe loss matrix used in cross-validation and test set. Only provided if method = 1 or method = 2.
penaltythe parameter used in the AIC criterion. Only provided if method = 0.
method0 = AIC, 1 = test set, 2 = cross-validation.
rangescolumn i gives the range of the i-th covariate.
loglmatrix with eight or eleven columns. Summarizes fits. Column one indicates the dimension, column column two the AIC or loss value, whichever was used during the model selection appropriate, column three four and five give the training set log-likelihood, (misclassification) loss and squared error loss, columns six to eight give the same information for the test set, column nine (or column six if method = 0 or method = 2) indicates whether the model was fitted during the addition stage (1) or during the deletion stage (0), column ten and eleven (or seven and eight) the minimum and maximum penalty parameter for which AIC would have selected this model.
samplesample size.
tsamplethe sample size of the test set. Only prvided if method = 1.
wgtsumsum of the case weights.
covnamesnames of the covariates.
classnames(numerical) names of the classes.
cv.aicthe penalty value that was determined optimal by by cross validation. Only provided if method = 2.
cv.tabtable with three columns. Column one and two indicate the penalty parameter range for which the cv-loss in column three would be realized. Only provided if method = 2.
seedthe random seed that was used to determine the order of the cases for cross-validation. Only provided if method = 2.

item

delete
beta
select
anova
twgtsum

code

method = 1

References

Charles Kooperberg, Smarajit Bose, and Charles J. Stone (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117--127.

Charles J. Stone, Mark Hansen, Charles Kooperberg, and Young K. Truong. The use of polynomial splines and their tensor products in extended linear modeling (with discussion) (1997). Annals of Statistics, 25, 1371--1470.

Examples

Run this code

data(iris)
fit.iris <- polyclass(iris[,5], iris[,1:4])

Run the code above in your browser using DataLab