Constructs ordinal forests (OF) as presented in Hornung et al. (in prep).
The following tasks can be performed using OF: 1) Predicting the values of an ordinal target variable for new observations based on covariate values (see predict.ordfor
);
2) Estimation of the relative widths of the classes of the ordinal target variable; 3) Ranking the importances of the covariates with respect to predicting the values of the ordinal target variable.
The default values for the following hyperparameters are appropriate for most applications: ndiv
, ntreeperdiv
, ntreefinal
, npermtrial
, and nbest
. Thus, these arguments do in general NOT have to be specified by hand.
For details on OF see the 'Details' section below.
ordfor(depvar, data, ndiv = 1000, ntreeperdiv = 100, ntreefinal = 5000,
perfmeasure = c("equal", "irrespective", "onecateg", "custom"), categimp,
categweights, npermtrial = 500, nbest = 10, importance = "permutation")
character. Name of the dependent variable in data
.
data.frame. Data frame containing the covariates and a factor-valued ordinal target variable.
integer. Number of random partitions of [0,1] to consider in the optimization.
integer. Number of trees in the smaller regression forests constructed for the ndiv
different partitions.
integer. Number of trees in the larger regression forest constructed using the optimized partition (i.e., the OF).
character. Performance measure. The default is "equal"
. See 'Details' subsection 'Performance measures' below and perfm
.
character. Class to priorize if perfmeasure="onecateg"
.
numeric. Needed if perfmeasure="custom"
: vector of length equal to the number of classes. Class weights - classes with
higher weights are to be predicted more precisely then those with smaller weights.
integer. Number of permutations to try for each partition considered during the optimization.
integer. Number of forests to consider when averaging the parititions corresponding to the nbest
forests with the highest levels of prediction performance.
character. Variable importance mode, one of 'none', 'impurity', 'permutation'. The 'impurity' measure is the variance of the responses for regression. Passed to the function ranger
from the package of the same name.
ordfor
returns an object of class ordfor
.
An object of class "ordfor
" is a list containing the following components:
object of class "ranger"
. Regression forest constructed using the optimized partition of [0,1] (i.e., the OF). Used by predict.ordfor
.
vector of length J+1. Average over the nbest
best partitions of [0,1]. Used by predict.ordfor
.
list of length ndiv
. The smaller forests constructed during optimizing the partition of [0,1].
vector of length ndiv
. Performance measure values of the ndiv
smaller forests stored in forests
.
matrix of dimension ndiv x (J+1). All ndiv
partitions considered.
character vector of length J. Classes of the target variable.
integer. Number of random partitions considered.
integer. Number of trees per partition.
integer. Number of trees of OF (constructed after optimizing the partition).
character. Performance measure.
character. If perfmeasure="onecateg"
: class to priorize, NA else.
integer. Number of forests considered in the averaging of the parititions corresponding to the nbest
forests with the highest levels of prediction performance.
table. Clas frequencies.
vector of length p. Variable importance for each covariate.
Ordinal forests (OF) are a method for ordinal regression with high-dimensional and low-dimensional data that is able to predict the values of the ordinal target variable for new observations and at the same time estimate the relative widths of the classes of the ordinal target variable. Using a (permutation-based) variable importance measure it is moreover possible to rank the importances of the covariates. OF will be presented in an upcoming technical report by Hornung et al..
The concept of OF is based on the following assumption: There exists a (possibly latent) refined continuous
variable y* underlying the observed ordinal target variable y (y in {1,...,J}, J number of classes), where y* determines the
values of y. The functional relationship between y* and y takes the form of a monotonically increasing
step function. Depending on which of J intervals ]a_1,
a_2],
]a_2,
a_3],
...,
]a_J,
a_{J+1}[
contains the value of y*, the ordinal target variable y takes a different value.
In the context of conditional inference trees, for situations in which the intervals ]a_1,
a_2],
...,
]a_J,
a_{J+1}[
are available, Hothorn et al. (2006) suggest to use as a metric target variable the midpoints of these intervals.
OF are however concerned with settings in which the intervals ]a_1,
a_2],
...,
]a_J,
a_{J+1}[ are unknown.
The main idea of OF is to estimate the unknown interval borders by maximizing the out-of-bag prediction performance
(see section "Performance measures") of standard regression forests that use as a
target variable a quantity very similar to the midpoints of the intervals.
More precisely, we assume y* to have a standard normal distribution and consider the following score values as the values of the target
variable: phi^(-1)((phi(a_j) + phi(a_{j+1}))/2) if y = j with a_1 = -infinite and a_{J+1} = +infinite, where "phi" denotes
the cumulative distribution function of the standard normal distribution.
The assumption of normality is simply made, because normally distributed target variables can be expected to
be well suited for standard regression forests.
The estimation of the optimal partition is based on constructing many regression forests (b in 1,...,ndiv
)
of smaller size, where each of these uses as the values of the target variable a score set obtained using random sampling:
{phi^(-1)((phi(a_{b,j}) + phi(a_{b,j+1}))/2) : j in 1,...,J}, where phi(a_{b,1})=0 < phi(a_{b,2}) <... < phi(a_{b,J}) < phi(a_{b,J+1})=1
is a random partition of [0,1].
After estimating the intervals, a larger regression forest is built using as values of the target variable the score values phi^(-1)((phi(a_{opt,j}) + phi(a_{opt,j+1}))/2), where the a_{opt,j} values are the optimized borders.
An important by-product of the construction of OFs are the estimated interval borders
a_{opt,1},...,a_{opt,J+1}. The optimized intervals
]phi(a_{opt,1}),
phi(a_{opt,2})],
...,
]phi(a_{opt,J}),
phi(a_{opt,J+1})[ can be interpreted
vividly as the estimated relative widths of the classes 1,...,J.
Further details are given in the sections "Hyperparameters" and "Performance measures".
There are several hyperparameters, which do, however, not have to be optimized by the user in general. Expect in the case of
nbest
the larger these values are, the better, but the default values should be
large enough in most applications.
These hyperparameters are described in the following:
ndiv
Default value: 1000. The default value of the number of considered partitions in the estimation of the optimal partition
is quite large. Such a large number of considered partitions is necessary to attain a high chance that some of the partitions are close enough to the optimal partition,
that is, the partition that leads to the optimal performance with respect to the considered performance measure (provided with perfmeasure
).
ntreeperdiv
Default value: 100. A smaller number of trees considered per tried
partition might lead to a too strong variability in the assessments of the performances achieved for the individual partitions.
ntreefinal
Default value: 5000. The number of trees ntreefinal
plays the same role as in conventional regression forests. For very high-dimensional datasets the default number 5000 might be too small.
npermtrial
Default value: 500. As stated above it is necessary to consider a large number of tried partitions ndiv
in the
optimization in order increase the chance that the best of the considered partitions are close to the optimal partition.
However, for a larger number of classes it is in addition necessary to use an efficient algorithm for the sampling of the considered partitions.
This algorithm should ensure that the partitions are heterogeneous enough to encertain that the best of the tried partitions are close enough to the
optimal partition. In OF we consider an algorithm that has the goal of making the rankings of the widths of the intervals corresponding to the respective classes
as dissimilar as possible between successive partitions.
Denote d_{b,j} := phi(a_{b,j}). Then the b-th partition is generated as follows:
If b = 1, then J-1 instances of a U(0,1) distributed random variable are drawn and the values are sorted. The sorted values are designated as d_{1,2},...,d_{1,J}. Moreover, set d_{1,1} := 0 and d_{1,J+1} := 1. The first considered partition is now { d_{1,1}, ... ,d_{1,J+1} }.
If b > 1, the following procedure is performed:
1.
Let dist_{b-1,j} = d_{b-1,j+1} - d_{b-1,j} (j in {1,...J}) denote the width of the j-th interval of the b-1-th partition and r_{b-1,j} the rank of dist_{b-1,j} among
dist_{b-1,1},...,dist_{b-1,J}. First, npermtrial
random permutations of 1,...,J are drawn, where r*_{l,1},...,r*_{l,J} denotes the l-th drawn
partition. Subsequently, that permutation
r*_{l*,1},...,r*_{l*,J} (l* in {1,...,npermtrial
}) is determined that features the greatest
quadratic distance to r_{b-1,1},...r_{b-1,J},
that is, r*_{l*,1},...,r*_{l*,J} =
argmax_{r*_{l,1},...,r*_{l,J}} sum_{j=1}^J (r*_{l,j} - r_{b-1,j})^2.
2.
A partition is generated in the same way as
in the case of b = 1.
3.
The intervals of the partition generated in 2. are re-ordered in such a way that the width of the j-th interval (for j in {1,...,J}) has rank r*_{l*,j}
among the widths of the intervals ]d_{b,1},
d_{b,2}],
...,
]d_{b,J},
d_{b,J+1}]. The b-th considered partition is now { d_{b,1}, ... ,d_{b,J+1} }.
The default value 500 for npermtrial
should be large enough to ensure sufficiently heterogeneous partitions - independent from the number of classes J.
nbest
Default value: 10. After having constructed ndiv
regression forests, each of these using a particular partition, the nbest
partitions associated with the highest
levels of prediction performance are considered: For each j in {1,...,J} the averages of the phi(a_j) values across these nbest
partitions are taken. Then phi^(-1)(.)
is applied to the obtained averages resulting in the estimated optimal borders: a_opt,1,...,a_opt,J+1. The latter are then used to calculate the scores for the OF using the following formula phi^(-1)((phi(a_{opt,j}) + phi(a_{opt,j+1}))/2) (see also above).
It is probably important that the number nbest
of best partitions considered
is not strongly misspecified. A too large value of nbest
could lead to averaging over a too heterogeneous set of partitions. Conversely, a too small value of nbest
could lead
to a too large variance in the estimation of the optimal partition. For larger numbers of tried partitions ndiv
the results are less sensitive to the choice of nbest
,
because the number of tried partitions that are close to the optimal partition becomes larger the more partitions are considered. When setting the number of considered partitions ndiv
to 1000, the value 10 for nbest
specified as a default value in ordfor
was seen to lead to a good trade-off
between the heterogeneity of the averaged partitions and the variance in the estimation.
As noted above, the different partitions tried during the estimation of the optimal partition are assessed with respect to their (out-of-bag) prediction performance. The choice of the specific performance measure used here determines the specific kind of performance the ordinal forest should feature:
perfmeasure="equal"
This choice should be made if it is of interest to predict each class with the same precision irrespective of the numbers of observations representing the individual classes.
Youden's J statistic is calculated with respect to each class ("observation/prediction in class j" vs. "observation/prediction NOT in class j" (j=1,...,J))
and the simple average of the J results taken.
perfmeasure="irrespective"
This choice should be made if it is of interest to predict all observations with the same precision independent of their categories.
Youden's J statistic is calculated with respect to each class and subsequently a weighted average of these values is taken - with weights
proportional to the number of observations representing the respective classes in the training data.
perfmeasure="onecateg"
This choice should be made if it is merely relevant that a particular class of the classes of the ordinal target variable
is predicted as precisely as possible.
This particular class must be provided via the argument categ
.
Youden's J statistic is calculated with respect to class categ
("observation/prediction in class categ
" vs. "observation/prediction NOT in class categ
").
perfmeasure="custom"
This choice should be made if there is a particular ranking of the classes with respect to their importance.
Youden's J statistic is calculated with respect to each class. Subsequently, a weighted average
with user-specified weights (provided via the argument categweights
) is taken. Classes with
higher weights are to be predicted more precisely than those with smaller weights.
Hothorn, T., Hornik, K., Zeileis, A. (2006) Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651--674.
# NOT RUN {
data(hearth)
set.seed(123)
hearthsubset <- hearth[sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2)))),]
ordforres <- ordfor(depvar="Class", data=hearthsubset, ndiv=80, nbest=5, ntreeperdiv=100,
ntreefinal=1000, perfmeasure = "equal")
# NOTE: ndiv=80 is not enough!! In practice, ndiv=1000 (default value) or a higher
# number should be used.
ordforres
sort(ordforres$varimp, decreasing=TRUE)
# }
Run the code above in your browser using DataLab