This cerr function generates a dataset with a specified number of observations and predictors, along with a response vector that has an error term sampled from cosine-based distributions on [-pi/2, pi/2].
cerr(n, nr, p, dist_type, ...)X,Y,e
is the number of observations
is the number of observations with a different error distribution segment (the second block)
is the dimension of the observation
is the cosine-based sampler to use:
"cosine_random", "cosine_rejection_sampling", or "cosine_metropolis_hastings"
is additional arguments (reserved for compatibility; not used)
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). tools:::Rd_expr_doi("10.1007/s11222-024-10471-z")
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. tools:::Rd_expr_doi("10.1080/02664763.2022.2053949").
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. (2025). Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6.
Gao, H., & Guo, G. (2025). LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930.
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270–277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575–581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332–337.
set.seed(12)
data <- cerr(n = 1200, nr = 200, p = 5, dist_type = "cosine_random")
str(data)
Run the code above in your browser using DataLab