OptimalBidimensionalEmbedding: Dimensionality reduction of multidimensional ordinal binary data

Description

Starting from a dataset with $n$ statistical units, scored against $k$ ordinal 0/1-indicators and partially ordered component-wise into a Boolean lattice $B_k=(\{0,1\}^k,\leq_{cmp})$, it finds the bidimensional data representation that optimally preserves the input order relation. The algorithm finding the best bidimensional representation is optimized by using a parallel C++ implementation.

Usage

OptimalBidimensionalEmbedding(
  profile,
  weights,
  output_every_sec = NULL,
  thread_share = 1
)

Value

a list of 5 elements named allLoss, variablesPriority, bestLossVAlue, bestVariablePriority, and bestRepresentation.

allLoss is a vector of dimension $k!/2$ reporting the value of the loss function $L(D^{out}|D^{inp},p)$

corresponding to the representation induced by each reversed pairs of lexicographic linear extensions. This loss function measures the global errors made in approximating the order structure of the input Boolean Lattice $B_k$ with its bidimensional representations.

variablesPriority is a matrix with $k!/2$ rows and $k$ columns. Each row is an integer vector of dimension $k$ containing a permutation $i_1,...,i_k$ of $1,...,k$. This vector specifies the criterion to build the reversed pair of lexicographic linear extensions used to approximate $B_k$. The first linear extension is built by ordering profiles first according to their scores on $V_{i_1}$, then to the scores on $V_{i_{2}}$ and so on, until $V_{i_{k}}$; the second linear extension is built by ordering profiles first according to their scores on $V_{i_k}$, then to the scores on $V_{i_{k-1}}$ and so on, until $V_{i_{1}}$. The $j$-th row of variablesPriority identifies the reversed pair of lexicographic linear extensions inducing the bidimensional representation associated to the $j$-th global loss in allLoss.

bestLossVAlue real number indicating the minimum value of the global error $L(D^{out}|D^{inp},p)$ among the $k!/2$ global errors associated to the different pairs of reversed lexicographic linear extensions.

bestVariablePriority integer vector of dimension $k$ containing the permutation of $1,...,k$ inducing the best bidimensional representation, i.e. the bidimensional representation with associated global error bestLossVAlue.

bestRepresentation a data frame with $m$ values (one value for each observed profile) of 5 variables named profiles, x, y, weights and error. $profile is an integer vector containing the base-10 representation of the $k$-dimensional Boolean vectors representing observed profiles. $x is an integer vector containing the x-coordinates of points representing observed profiles in the optimal bidimensional representation. $y is an integer vector containing the y-coordinates of points representing observed profiles in the optimal bidimensional representation. $weights is a real vector with the frequencies/weights of each observed profile. $error is a real vector with the values of the approximation errors $L(b|D^{inp}, p)$ associated to each observed profile in the optimal bidimensional representation.

Arguments

profile: Boolean matrix of dimension $m\times k$ of the unique $m\leq n$ different observed profiles. Each observed profile is a row of profile. Each observed profile is repeated only once in the matrix profile.
weights: real vector of length $m$ with the frequencies/weights of each observed profiles. Element of position $j$ in vector weights is the frequency/weight of the profile in row $j$ of profile.
output_every_sec: Integer specifying a time interval (in seconds). By specifying this argument, during the execution of OptimalBidimensionalEmbedding, a message reporting the number of reversed pairs of lexicographic linear extensions analyzed is printed on the R-Console, every output_every_sec seconds. Note that the number of reversed pairs of lexicographic linear extensions to be analyzed is $k!/2$.
thread_share: real number in the interval $(0,1])$ specifying the share of CPU threads to be involved in the algorithm execution.

Examples

Run this code

#SIMULATING OBSERVED BINARY DATA
#number of binary variables
k <- 6
#building observed profiles matrix
profiles <- sapply((0:(2^k-1)) ,function(x){ as.integer(intToBits(x))})
profiles <- t(profiles[1:k, ])
#building the vector of observation frequencies
weights <- sample.int(100, nrow(profiles), replace=TRUE)
#FINDING THE OPTIMAL BIDIMENSIONAL REPRESENTATION
result <- OptimalBidimensionalEmbedding(profiles, weights)

Run the code above in your browser using DataLab