TPR-DAG: TPR-DAG variants

Description

Different variants of the TPR-DAG algorithm are implemented. In their more general form the TPR-DAG algorithms adopt a two step learnig strategy:

in the first step they compute a per-level bottom-up visit from the leaves to the root to propagate positive predictions across the hierarchy;
in the second step they compute a per-level top-down visit from the root to the leaves in order to assure the hierarchical consistency of the predictions

Usage

tpr.threshold(S, g, root = "00", t = 0.5)
tpr.threshold.free(S, g, root = "00")
tpr.weighted.threshold.free(S, g, root = "00", w = 0.5)
tpr.weighted.threshold(S, g, root = "00", t = 0.5, w = 0.5)

Arguments

a named flat scores matrix with examples on rows and classes on columns

a graph of class graphNEL. It represents the hierarchy of the classes

root

name of the class that it is on the top-level of the hierarchy (def. root="00")

threshold for the choice of positive children (def. t=0.5)

weight to balance between the contribution of the node $i$ and that of its positive children

Value

a named matrix with the scores of the classes corrected according to the TPR-DAG algorithm.

Details

The vanilla TPR-DAG adopts a per-level bottom-up traversal of the DAG to correct the flat predictions $\hat{y}_i$: $$ \bar{y}_i := \frac{1}{1 + |\phi_i|} (\hat{y}_i + \sum_{j \in \phi_i} \bar{y}_j) $$ where $\phi_i$ are the positive children of $i$. Different strategies to select the positive children $\phi_i$ can be applied:

Threshold-Free strategy: the positive nodes are those children that can increment the score of the node $i$, that is those nodes that achieve a score higher than that of their parents: $$ \phi_i := \{ j \in child(i) | \bar{y}_j > \hat{y}_i \} $$
Threshold strategy: the positive children are selected on the basis of a threshold that can ben selected in two different ways:
1. for each node a constant threshold $\bar{t}$ is a priori selected: $$ \phi_i := \{ j \in child(i) | \bar{y}_j > \bar{t} \} $$ For instance if the predictions represent probabilities it could be meaningful to a priori select $\bar{t}=0.5$.
2. the threshold is selected to maximize some performance metric $\mathcal{M}$ estimated on the training data, as for instance the F-score or the AUPRC. In other words the threshold is selected to maximize some measure of accuracy of the predictions $\mathcal{M}(j,t)$ on the training data for the class $j$ with respect to the threshold $t$. The corresponding set of positives $\forall i \in V$ is: $$ \phi_i := \{ j \in child(i) | \bar{y}_j > t_j^*, t_j^* = \arg \max_{t} \mathcal{M}(j,t) \} $$ For instance $t_j^*$ can be selected from a set of $t \in (0,1)$ through internal cross-validation techniques.

The weighted TPR-DAG version can be designed by adding a weight $w \in [0,1]$ to balance between the contribution of the node $i$ and that of its positive children $\phi$, through their convex combination: $$ \bar{y}_i := w \hat{y}_i + \frac{(1 - w)}{|\phi_i|} \sum_{j \in \phi_i} \bar{y}_j $$ If $w=1$ no weight is attributed to the children and the TPR-DAG reduces to the HTD-DAG algorithm, since in this way only the prediction for node $i$ is used in the bottom-up step of the algorithm. If $w=0$ only the predictors associated to the children nodes vote to predict node $i$. In the intermediate cases we attribute more importance to the predictor for the node $i$ or to its children depending on the values of $w$.

Examples

Run this code

# NOT RUN {
data(graph);
data(scores);
data(labels);
root <- root.node(g);
S.tprTF <- tpr.threshold.free(S,g,root);
S.tprT <- tpr.threshold(S,g,root,t=0.5);
S.tprW <- tpr.weighted.threshold.free(S,g,root,w=0.5);
S.tprWT <- tpr.weighted.threshold(S,g,root,w=0.5, t=0.5);
# }

Run the code above in your browser using DataLab