setup_rank_data: Setup rank data

Description

Prepare rank or preference data for further analyses.

Usage

setup_rank_data(
  rankings = NULL,
  preferences = NULL,
  user_ids = numeric(),
  observation_frequency = NULL,
  validate_rankings = TRUE,
  na_action = c("augment", "fail", "omit"),
  cl = NULL,
  shuffle_unranked = FALSE,
  random = FALSE,
  random_limit = 8L,
  timepoint = NULL,
  n_items = NULL
)

Value

An object of class "BayesMallowsData", to be provided in the data

argument to compute_mallows().

Arguments

rankings

A matrix of ranked items, of size n_assessors x n_items. See create_ranking() if you have an ordered set of items that need to be converted to rankings. If preferences is provided, rankings is an optional initial value of the rankings. If rankings has column names, these are assumed to be the names of the items. NA values in rankings are treated as missing data and automatically augmented; to change this behavior, see the na_action argument to set_model_options(). A vector length n_items is silently converted to a matrix of length 1 x n_items, and names (if any), are used as column names.

preferences

A data frame with one row per pairwise comparison, and columns assessor, top_item, and bottom_item. Each column contains the following:

assessor is a numeric vector containing the assessor index.
bottom_item is a numeric vector containing the index of the item that was disfavored in each pairwise comparison.
top_item is a numeric vector containing the index of the item that was preferred in each pairwise comparison.

So if we have two assessors and five items, and assessor 1 prefers item 1 to item 2 and item 1 to item 5, while assessor 2 prefers item 3 to item 5, we have the following df:

assessor	bottom_item	top_item
1	2	1
1	5	1
2	5	3

user_ids

Optional numeric vector of user IDs. Only only used by update_mallows(). If provided, new data can consist of updated partial rankings from users already in the dataset, as described in Section 6 of steinSequentialInferenceMallows2023;textualBayesMallows.

observation_frequency

A vector of observation frequencies (weights) to apply do each row in rankings. This can speed up computation if a large number of assessors share the same rank pattern. Defaults to NULL, which means that each row of rankings is multiplied by 1. If provided, observation_frequency must have the same number of elements as there are rows in rankings, and rankings cannot be NULL. See compute_observation_frequency() for a convenience function for computing it.

validate_rankings

Logical specifying whether the rankings provided (or generated from preferences) should be validated. Defaults to TRUE. Turning off this check will reduce computing time with a large number of items or assessors.

na_action

Character specifying how to deal with NA values in the rankings matrix, if provided. Defaults to "augment", which means that missing values are automatically filled in using the Bayesian data augmentation scheme described in vitelli2018;textualBayesMallows. The other options for this argument are "fail", which means that an error message is printed and the algorithm stops if there are NAs in rankings, and "omit" which simply deletes rows with NAs in them.

cl

Optional computing cluster used for parallelization when generating transitive closure based on preferences, returned from parallel::makeCluster(). Defaults to NULL.

shuffle_unranked

Logical specifying whether or not to randomly permute unranked items in the initial ranking. When shuffle_unranked=TRUE and random=FALSE, all unranked items for each assessor are randomly permuted. Otherwise, the first ordering returned by igraph::topo_sort() is returned.

random

Logical specifying whether or not to use a random initial ranking. Defaults to FALSE. Setting this to TRUE means that all possible orderings consistent with the stated pairwise preferences are generated for each assessor, and one of them is picked at random.

random_limit

Integer specifying the maximum number of items allowed when all possible orderings are computed, i.e., when random=TRUE. Defaults to 8L.

timepoint

Integer vector specifying the timepoint. Defaults to NULL, which means that a vector of ones, one for each observation, is generated. Used by update_mallows() to identify data with a given iteration of the sequential Monte Carlo algorithm. If not NULL, must contain one integer for each row in rankings.

n_items

Integer specifying the number of items. Defaults to NULL, which means that the number of items is inferred from rankings or from preferences. Setting n_items manually can be useful with pairwise preference data in the SMC algorithm, i.e., when rankings is NULL and preferences is non-NULL, and contains a small number of pairwise preferences for a subset of users and items.

setup_rank_data: Setup rank data

Description

Usage

Value

Arguments

References

See Also