btfit: Fits the Bradley-Terry model

Description

btfit fits the Bradley-Terry model on (potentially) large and sparse datasets.

Usage

btfit(btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000,
  epsilon = 0.001)

Arguments

btdata

An object of class "btdata", typically the result ob of ob <- btdata(..). See btdata.

Must be >= 1. When a = 1, the function returns the maximum likelihood estimate (MLE) of $\pi$ (by component, if necessary). When a > 1, a is the shape parameter for the Gamma prior. See Details.

MAP_by_component

Logical. Only considered if a > 1. Then, if FALSE, the MAP estimate will be found on the full dataset. If TRUE, the MAP estimate will be found separately for each fully-connected component.

subset

A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components).

maxit

The maximum number of iterations for the algorithm. If returning $\pi$ by component, this will be the maximum number of iterations for each component.

epsilon

Determines when the algorithm is deemed to have converged. (See Details.)

Value

btfit returns an S3 object of class "btfit". It is a list containing the following components:

call

The matched call

A list of length $M$, where $M$ is the number of fully-connected components of the comparison graph $G_W$ (or the requested subset) of two or more items. The $m$-th list item is a named vector $\pi$, the strength parameter, for the items in the $m$-th fully connected component, $m = 1, \ldots, M$. These are sorted in descending order.

iters

A vector of length $M$. The $m$-th entry is the number of iterations it took for the algorithm to converge for the $m$-th component, for $m = 1, \ldots, M$. Note that if the algorithm has not converged in any component, a warning will be produced.

converged

A logical vector of length $M$, indicating whether the algorithm has converged for the $m$-th component in maxit iterations.

A list of length $M$. The $m$-th list item is a matrix where each element $n_{ij}$ is the number of times item $i$ played against item $j$, for the items in the $m$-th component. The rows and columns are arranged in the same order as the ordered pi vector(s).

diagonal

A list of length $M$. The $m$-th item is a vector of the diagonal elements of the btdata$wins matrix, for the items in the $m$-th fully-connected component. These values are used as the fitted values for the diagonal of the matrix output in fitted.btfit.

names_dimnames

The names of the dimnames of the original btdata$wins matrix.

Details

Let there be $K$ items, let $\pi_k$ be the Bradley-Terry strength parameter of item $k$, for $k = 1, \ldots, K$ and let $\pi$ be the vector of all the $\pi_k$. Let $w_{ij}$ be the number of times item $i$ wins against item $j$, let $n_{ij} = w_{ij} + w_{ji}$ be the number of times they play, with $w_{ii} = 0$ by convention and let $W_i = \sum_{j=1}^K w_{ij}$. Then the Bradley-Terry model states that the probability of item $i$ beating item $j$, $p_{ij}$, is:

$$p_{ij} = \frac{\pi_i}{\pi_i + \pi_j}.$$

The comparison graph, $G_W$, has the $K$ players as the nodes and a directed edge from node $i$ to node $j$ whenever item $i$ has beaten item $j$ at least once. The MLE of the Bradley-Terry model exists and is finite if and only if the comparison graph is fully-connected (i.e. if there is a directed path from node $i$ to node $j$ for all items $i$ and $j$).

Assuming that the comparison graph of the data is fully-connected, the MLE of the Bradley-Terry model can be found using the MM-algorithm (Hunter, 2004).

If the comparison graph of the data is not fully-connected, there are two principled options for fitting the Bradley-Terry model. One is to find the MLE within each fully-connected component. The other is to find the Bayesian MAP estimate, as suggested by Caron & Doucet (2012), where a $Gamma(a,b)$ gamma prior is placed on each $\pi_i$, and the product of these is taken as a prior on $\pi$. The MAP estimate can then be found with an EM-algorithm. When $a = 1$ and $b = 0$, the EM and MM-algorithms are equivalent and the MAP estimate and MLE are identical. The rate parameter of the Gamma prior, $b$, is not likelihood identifiable. When $a > 1$, $b$ is set to $aK - 1$, where $K$ is the number of items in the component; this choice of $b$ minimises the number of iterations needed for the algorithm to converge.

The likelihood equations give

$$a - 1 + W_i = b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j},$$

for $i = 1, \ldots, K$. For the algorithm to have converged, we want $\pi$ to be such that the LHS and RHS of this equation are close for all $i$. Therefore, we set the convergence criteria as

$$\left|\frac{a - 1 + W_i}{b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j}} - 1\right| < \epsilon,$$

for all $i$.

Since the equations do not typeset well within the R help window, we recommend reading this section online: https://ellakaye.github.io/BradleyTerryScalable/reference/btfit.html.

References

Caron, F. and Doucet, A. (2012) Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.

Hunter, D. R. (2004) MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, 32(1), 384-406.

Examples

Run this code

citations_btdata <- btdata(BradleyTerryScalable::citations)
fit1 <- btfit(citations_btdata, 1)
summary(fit1)
toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D"))
toy_btdata <- btdata(toy_df_4col)
fit2a <- btfit(toy_btdata, 1)
summary(fit2a)
fit2b <- btfit(toy_btdata, 1.1)
summary(fit2b)
fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3)
summary(fit2c)

Run the code above in your browser using DataLab