Binary transformation of continuous or discrete variables with \(\rho\ge 3\) number of levels. Two different methods are available for the transformation.
The first method uses the argument k
in the pick
function, and assumes a pick k
out of N
response process. Such type of response processes are met in surveys and questionnaires, in which respondents are asked to pick exactly the k
most preferred items. The value for k
is an integer between 1 and ncol(data)
. By choosing an integer for k
, this function ''picks'' the k
higher values in each row (if byItem=FALSE
) of data
. The k
higher values in each row become 1 and the rest ncol(data)-k
elements are set to 0. Obviously, if k=ncol(data)
, then the resulting matrix will only consists of 1's and no 0's.
The second method is based on thresholding in order to binarize the data. For this method, the user should provide threshold(s) with the parameter cutoff
in the pick
function (default cutoff=NULL
). If one value is provided in the cutoff
parameter, i.e., cutoff=
\(\alpha\), then \(\alpha\) is used as threshold in each row \(i\) (if byItem=FALSE
) of the data matrix data
such that, any value greater than or equal to cutoff
in row \(i\) becomes 1 and 0 else. Additionally, the user can provide row (or column) specific cut off values, i.e., cutoff=
\(\alpha\) with \(\alpha=(\alpha_1,...,\alpha_K)\) where \(\alpha_i\) is the cut-off value for the row or column \(i\). In this case, if \(x_{ij}\ge \alpha_i\) then \(x_{ij}=1\) and \(x_{ij}=0\) else.
The two methods cannot be used simultaneously. Only one of the parameters k
and cutoff
can be different than NULL
each time. If both parameters are equal NULL
(default), then a row specific cut off is determined automatically for each row \(i\) of data
, such that, \(\alpha_i= \bar{data_i}\). The dichotomization is performed by row of data
, except the case, byItem=TRUE
.
When the argument k
is used, it can be the case that more than k
values can be picked (i.e., ties). In this case, the choice on which item will be picked is being made after we add a small amount of noise in each observation of row or column \(i\). This is done with the function jitter
.