# A brief manual

```
set.seed(1)
is_CRAN <- !identical(Sys.getenv("NOT_CRAN"), "true")
if (!is_CRAN) {
options(mc.cores = parallel::detectCores())
} else {
q() # takes too long, pretend everything is fine
}
library(knitr)
library(ggplot2)
library(reshape2)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# Overview

The **pcFactorStan** package for **R** provides convenience functions and pre-programmed **Stan** models related to analysis of pairwise comparison data. Its purpose is to make fitting models using Stan easy. **pcFactorStan** relies on the **rstan** package, which should be installed first.
See here for instructions on installing **rstan**.

One situation where a factor model might be useful is when there are people that play in tournaments of more than one game. For example, the computer player AlphaZero (Silver et al. 2018) has trained to play chess, shogi, and Go. We can take the tournament match outcome data for each of these games and find rankings among the players. We may also suspect that there is a latent board game skill that accounts for some proportion of the variance in the per-board game rankings. This proportion can be recovered by the factor model.

Our goal may be to fit a factor model, but it is necessary to build up the model step-by-step. There are essentially three models: 'unidim', 'covariance', and 'factor'. 'unidim' analyzes a single item. 'covariance' is suitable for two or more items. Once you have vetted your items with the 'unidim' and 'covariance' models, then you can try the 'factor' model. There is also a special model 'unidim_adapt'. Except for this model, the other models require a scaling constant. To find an appropriate scaling constant, we will fit 'unidim_adapt' to each item separately and then take the median of median point estimates to set the scale.

# Brief tutorial

## Physical activity flow propensity

The R code below first loads **rstan** and **pcFactorStan**.

```
library(rstan)
library(pcFactorStan)
```

Next we take a peek at the data.

`head(phyActFlowPropensity)`

`kable(head(phyActFlowPropensity))`

These data consist of paired comparisons of 87 physical activities on 16 flow-related facets. The procedure was that participants submitted two activities using free-form input. These activities were substitute into item templates. For example, the 'predict' item asked, "How predictable is the action?" with response options:

- A1 is much more predictable than A2.
- A1 is somewhat more predictable than A2.
- Both offer roughly equal predictability.
- A2 is somewhat more predictable than A1.
- A2 is much more predictable than A1.

A "somewhat more" response is scored 1 or -1 and "much more" scored 2 or -2. A tie (i.e. "roughly equal") is scored as zero. We will need to analyze each item separately before we analyze them together. Therefore, we will start with 'skill'.

Data must be fed into **Stan** in a partially digested form. The next block of code demonstrates how a suitable data object may be constructed using the `prepData()`

function. This function automatically determines the
number of threshold parameters based on the
range observed in your data.
One thing it does not do is pick a `varCorrection`

factor. The `varCorrection`

determines the degree of adaption in the model. Usually a setting of between 2.0 to 4.0 will obtain optimal results.

```
dl <- prepData(phyActFlowPropensity[,c(paste0('pa',1:2), 'skill')])
dl$varCorrection <- 2.0
```

Next we fit the model using the `pcStan()`

function, which is a wrapper for `stan()`

from **rstan**. We also choose the number of chains.
As is customary **Stan** procedure, the first half of each chain is discarded as warm up.

`fit1 <- pcStan("unidim_adapt", data=dl)`

A variety of diagnostics are available to check whether the sampler ran into trouble.

`check_hmc_diagnostics(fit1)`

Everything looks good, but there are a few more things to check. We want $\widehat R < 1.015$ and effective sample size greater than 100 times the number of chains (Vehtari et al., 2019).

```
allPars <- summary(fit1, probs=c())$summary
print(min(allPars[,'n_eff']))
print(max(allPars[,'Rhat']))
```

Again, everything looks good. If the targets values were not reached then we would sample the model again with more iterations. Time for a plot,

```
library(ggplot2)
theta <- summary(fit1, pars=c("theta"), probs=c())$summary[,'mean']
palist <- levels(filterGraph(phyActFlowPropensity)$pa1)
ggplot(data.frame(x=theta, activity=palist, y=0.47)) +
geom_point(aes(x=x),y=0) +
geom_text(aes(label=activity, x=x, y=y),
angle=85, hjust=0, size=2,
position = position_jitter(width = 0, height = 0.4)) + ylim(0,1) +
theme(legend.position="none",
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
```

Intuitively, this seems like a fairly reasonable ranking for skill. As pretty as the plot is, the main reason that we fit this model was to find a scaling factor that would produce a standard deviation close to 1.0,

```
s50 <- summary(fit1, pars=c("scale"), probs=c(.5))$summary[,'50%']
print(s50)
```

We use the median instead of the mean because `scale`

is not likely to have a symmetric marginal posterior distribution.
We obtained `r round(s50,2)`

, but that value is just for one item.
We have to perform the same procedure for every item.
Wow, that would be really tedious ... if we did not have a function to do it for us!
Fortunately, `calibrateItems`

takes care of it and produces a table
of the pertinent data,

`result <- calibrateItems(phyActFlowPropensity, iter=1000L)`

`print(result)`

`kable(result)`

The items *goal1* and *feedback1* ran into trouble.
The nonzero count of divergent transitions and low_bfmi means that these items contained too little signal to estimate. We could try again with `varCorrection=1.0`

, but we are going to exclude them instead.
The model succeeded on the rest of the items.
I requested `iter=1000L`

to demonstrate how `calibrateItems`

will resample the model until the `n_eff`

is large enough and the `Rhat`

is small enough. Items *skill* and *predict* (among others) needed 1500 samples instead of the default 1000.
The median scale across all included items is now readily available,

```
excl <- match(c('goal1','feedback1'), result$item)
median(result[-excl,'scale'])
```

Next we will fit the covariance model. We exclude the Cholesky
factor of the correlation matrix `rawThetaCorChol`

because the
regular correlation matrix is also available.

```
pafp <- phyActFlowPropensity
excl <- match(c('goal1','feedback1'), colnames(pafp))
pafp <- pafp[,-excl]
dl <- prepData(pafp)
dl$scale <- median(result[-excl,'scale'])
```

`fit2 <- pcStan("covariance", data=dl, include=FALSE, pars=c('rawTheta', 'rawThetaCorChol'))`

```
check_hmc_diagnostics(fit2)
allPars <- summary(fit2, probs=0.5)$summary
print(min(allPars[,'n_eff']))
print(max(allPars[,'Rhat']))
```

The HMC diagnostics looks good, but ... oh dear!
Something is wrong with the `n_eff`

and $\widehat R$.
Let us look more carefully,

`head(allPars[order(allPars[,'sd']),])`

Ah ha! It looks like all the entries of the correlation matrix are reported, including the entries that are not stochastic but are fixed to constant values. We need to filter those out to get sensible results.

```
excl <- match(paste0('thetaCor[',1:dl$NITEMS,',', 1:dl$NITEMS,']'), rownames(allPars))
allPars <- allPars[-excl,]
print(min(allPars[,'n_eff']))
print(max(allPars[,'Rhat']))
```

Ah, much better. Now we can inspect the correlation matrix. There are
many ways to visualize a correlation matrix. One of my favorite
ways is to plot it using the **qgraph** package,

```
itemNames <- colnames(pafp)[-(1:2)]
tc <- summary(fit2, pars=c("thetaCor"), probs=c(.5))$summary[,'50%']
tcor <- matrix(tc, length(itemNames), length(itemNames))
dimnames(tcor) <- list(itemNames, itemNames)
library(qgraph)
qgraph(tcor, layout = "spring", graph = "cor", labels=colnames(tcor),
legend.cex = 0.3,
cut = 0.3, maximum = 1, minimum = 0, esize = 20,
vsize = 7, repulsion = 0.8)
```

Based on this plot and theoretical considerations,
I decided to exclude *spont*, *control*, *evaluated*, and *waiting*
from the factor model.
A detailed rationale for why these items are excluded and not others
will be presented in a forthcoming article.
For now, let us focus on the mechanics of the next step in
our analyses.

```
itemNames <- setdiff(itemNames, c('spont','control','evaluated','waiting'))
pafp <- pafp[,c(paste0('pa',1:2), itemNames)]
dl <- prepData(pafp)
dl$scale <- median(result[-excl,'scale']) # close enough
```

```
fit3 <- pcStan("factor", data=dl, include=FALSE,
pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))
```

```
check_hmc_diagnostics(fit3)
allPars <- summary(fit3, probs=0.5)$summary
print(min(allPars[,'n_eff']))
print(max(allPars[,'Rhat']))
```

The maximum $\widehat R$ is too high. We need to try again with more iterations,

```
fit3 <- pcStan("factor", data=dl, include=FALSE, iter=4000L,
pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))
```

```
check_hmc_diagnostics(fit3)
allPars <- summary(fit3, probs=0.5)$summary
print(min(allPars[,'n_eff']))
print(max(allPars[,'Rhat']))
```

Looks good! Let us plot the marginal posterior distributions of the factor proportions,

```
library(reshape2)
prop <- summary(fit3, pars='factorProp', probs=c(0.1, 0.5, 0.9))$summary[,c('10%','50%','90%')]
colnames(prop) <- paste0('q',c(10,50,90))
prop <- as.data.frame(prop)
prop$item <- factor(itemNames)
rawProp <- extract(fit3, pars=c("factorProp"))[[1]]
colnames(rawProp) <- itemNames
rawProp <- rawProp[sample.int(nrow(rawProp), 500),]
rawPropTall <- melt(rawProp, variable.name='item')
colnames(rawPropTall)[2] <- c('item')
ggplot() + geom_vline(xintercept=0, color="green") +
geom_jitter(data=rawPropTall, aes(value, item), height = 0.35, alpha=.05) +
geom_segment(data=prop, aes(y=item, yend=item, x=q10, xend=q90),
color="yellow", alpha=.5) +
geom_point(data=prop, aes(x=q50, y=item), color="red", size=1) +
theme(axis.title.y=element_blank())
```

We can also look at item response curves,

```
thresholdVec <- summary(fit3, pars='threshold', probs=c())$summary[,'mean']
thresholds <- matrix(thresholdVec, nrow=2, dimnames=list(NULL, itemNames))
softmax <- function(y) exp(y) / sum(exp(y))
draw1 <- function(scale, th, item) {
tdiff <- seq(-2.5/scale, 2.5/scale, .05/scale)
gr <- expand.grid(tdiff=tdiff, category=c("much more","somewhat more", 'equal',
"somewhat less", "much less"), p=NA, item=item)
gg <- matrix(c(0, rev(cumsum(th)), -cumsum(th)), ncol=5, nrow=length(tdiff), byrow=TRUE)
gg[,2:5] <- (gg[,2:5] + scale * tdiff)
gg <- t(apply(gg, 1, cumsum))
gg <- t(apply(gg, 1, softmax))
for (lev in 1:length(levels(gr$category))) {
gr[gr$category == levels(gr$category)[lev],'p'] <- gg[,lev]
}
geom_line(data=gr, aes(x=tdiff,y=p,color=category,linetype=category), size=.1, alpha=.2)
}
rawThreshold <- extract(fit3, pars=c("threshold"))[[1]]
rawThreshold <- rawThreshold[sample.int(nrow(rawThreshold), 50),]
pl <- ggplot() + xlab("difference in latent score (logits)") + ylab("probability") +
ylim(0,1) + facet_wrap(~item)
for (cx in 1:nrow(rawThreshold)) {
for (ix in 1:length(itemNames)) {
pl <- pl + draw1(dl$scale, rawThreshold[cx,c(ix*2-1,ix*2)], itemNames[ix])
}
}
print(pl)
```

`print(thresholds)`

`kable(thresholds)`

Finally, we can plot the factor scores. Activities with small sample
sizes are retained by `filterGraph`

if they connect other activities
because they contribute information to the model. However, when we
look at the per-activity factor scores, we can limit ourselves to
activities with a sample size of at least 11.

```
orig <- filterGraph(pafp)$pa1
pa11 <- filterGraph(pafp, minDifferent=11L)$pa1
fs <- summary(fit3, pars='factor', probs=c(0.1, 0.5, 0.9))$summary[,c('10%','50%','90%')]
colnames(fs) <- paste0('q',c(10,50,90))
fs <- fs[match(levels(pa11), levels(orig)),]
fs <- as.data.frame(fs)
fs$activity <- levels(pa11)
fs$activity <- factor(fs$activity, levels=levels(pa11)[order(fs$q50)])
rawFs <- extract(fit3, pars=c("factor"))[[1]]
rawFs <- rawFs[,match(levels(pa11), levels(orig))]
colnames(rawFs) <- levels(pa11)
rawFs <- rawFs[sample.int(nrow(rawFs), 500),]
rawFsTall <- melt(rawFs, variable.name='activity')
colnames(rawFsTall)[2] <- c('activity')
rawFsTall$activity <- factor(as.character(rawFsTall$activity),
levels=levels(pa11)[order(fs$q50)])
ggplot() + geom_vline(xintercept=0, color="green") +
geom_jitter(data=rawFsTall, aes(value, activity), height = 0.35, alpha=.05) +
geom_segment(data=fs, aes(y=activity, yend=activity, x=q10, xend=q90),
color="yellow", alpha=.5) +
geom_point(data=fs, aes(x=q50, y=activity), color="red", size=1) +
theme(axis.title.y=element_blank())
```

# Technical notes

Given that my background is more in software than math, I am not a fan of the greek letters used with such enthusiasm by mathematicians. When I name variables, I favor the expressive over the succinct.

If you read through the **Stan** models, you will find some
variables prefixed with `raw`

. These are special variables
internal to the model. In particular, you should not try
to evaluate the $\widehat R$ or effective sample size
of `raw`

parameters. These parameters are best excluded
from the sampling output.

## Unidim

parameter | prior | purpose |
---|---|---|

threshold | normal(0,2) | item response thresholds |

theta | normal(0,1) | latent score |

The 'unidim_adapt' model has a `varCorrection`

constant
that is used to calibrate the `scale`

. For multivariate models,
`scale`

should be set to the median of the item-wise scales.

## Covariance

parameter | prior | purpose |
---|---|---|

threshold | normal(0,2) | item response thresholds |

thetaCor | lkj(2) | correlations between items |

sigma | lognormal(1,1) | relative item standard deviations |

theta | see below |
latent score |

Thresholds for all
items are combined into a single vector.
The prior for `theta`

is multivariate normal with correlations
`thetaCor`

and standard deviations `sigma`

.
Exclude `rawTheta`

and `rawThetaCorChol`

from sampling.

## Factor

parameter | prior | purpose |
---|---|---|

threshold | normal(0,2) | item response thresholds |

unique | normal(1,1) T[0,] | standard deviation of unique scores |

uniqueTheta | normal(0,1) | unique scores |

factorLoadings | normal(0,1) | signed standard deviation of factor scores |

factor | normal(0,1) | factor scores |

factorProp | N/A | signed factor variance proportion |

Thresholds for all items are combined into a single vector.
`factorProp`

is computed using Equation 3 of Gelman et al. (in press)
and has no prior of its own.
`factorLoadings`

is in standard deviation units but can be negative.
Similarly, `factorProp`

is a signed proportion bounded between -1 and 1.
Exclude `rawUnique`

, `rawUniqueTheta`

, `rawFactor`

, and `rawLoadings`

from sampling.

# References

Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (in press). R-squared for Bayesian regression models. The American Statistician. DOI: 10.1080/00031305.2018.1549100

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Lillicrap, T. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2019). Rank-normalization, folding, and localization: An improved $\widehat R$ for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.

# R Session Info

`sessionInfo()`