gen_multi_data
generate the data used for multiple-class
classification problems.
gen_multi_data(beta0, N, type, test_ratio)
A numeric matrix that represent the true coefficient that used to generate the synthesized data.
A numeric number specifying the number of the synthesized data. It should be a integer. Note that the value shouldn't be too small. We recommend that the value be 10000.
A character string that determines which type of data will be generated, matching one of 'ord' or 'cat'.
A numeric number specifying proportion of test sets in all data. It should be a number between 0 and 1. Note that the value of the test_ratio should not be too large, it is best if this value is equal to 0.2-0.3.
a list containing the following components
The id of the training samples
the training datasets. Note that the id of the data in the train dataset is the same as the train_id
the testing datasets
gen_multi_data creates training dataset and testing datasets. The beta0 is a p * k matrix which p is the length of true coefficient and (k + 1) represents the number of categories. The value of 'type' can be 'ord' or 'cat' . If it equals to 'ord', it means the data has an ordinal relation among classes ,which is common in applications (e.g., the label indicates the severity of a disease or product preference). If it is 'cat', it represents there is no such ordinal relations among classes. In addition, the response variable y are then generated from a multinomial distribution with the explanatory variables x generated from a multivariate normal distribution with mean vector equal to 0 and the identity covariance matrix.
Li, J., Chen, Z., Wang, Z., & Chang, Y. I. (2020). Active learning in multiple-class classification problems via individualized binary models. Computational Statistics & Data Analysis, 145, 106911. doi:10.1016/j.csda.2020.106911
gen_bin_data
for binary classification case
gen_GEE_data
for generalized estimating equations case.
# NOT RUN {
# For an example, see example(seq_ord_model)
# }
Run the code above in your browser using DataLab