gen_bin_data: generate the data used for the model experiment

Description

gen_bin_data generate the data used for the model experiment

Usage

gen_bin_data(beta, N, nclass, seed)

Arguments

beta

A numeric vector that represents the true coefficients that used to generate the synthesized data.

A numeric number specifying the number of the synthesized data. It should be an integer.

nclass

A numeric number used to specify how many clusters the original data would be transformed into. It should be an integer.

seed

Set random number seed.

Value

a list of seven elements:

data.clust

list with clustering results. Samples in the same list element are closer with each other

the samples with the smallest variance from each cluster. Note that the length of X is the same as the number of data.clust

the target value of 0 or 1 corresponding to X

Details

The function gen_bin_data generates N points. That is,the first column of the design matrix is 1 and the second column has a normal distribution with a mean of 1 and a variance of 1 and the rest columns with a mean of 0 and a variance of 1. Next, they are clustered into classes to decrease the computation cost. You should specify the number of classes. In the function, it's the parameter nclass.

References

Wang Z, Kwon Y, Chang YcI (2019). Active learning for binary classification with variable selection. arXiv preprint arXiv:1901.10079.

Examples

Run this code

# NOT RUN {
# For an example, see example(seq_bin_model)
# }

Run the code above in your browser using DataLab