This function creates the centers of data nuggets from a random sample.
create.DNcenters(RS,
delete.percent,
DN.num,
dist.metric,
make.pb = FALSE)
DN.num by (ncol(RS)) data frame containing the data nugget centers.
A data matrix (data frame, data table, matrix, etc) containing only entries of class numeric.
The proportion of observations to remove from the data matrix at each iteration when finding data nugget centers. Must be of class numeric and within (0,1).
The number of data nuggets to create. Must be of class numeric.
The distance metric used to create the initial centers of data nuggets. Must be 'euclidean' or 'manhattan'.
Print progress bar? Must be TRUE or FALSE.
Traymon Beavers, Javier Cabrera, Mariusz Lubomirski
This function is used for reducing a random sample to data nugget centers in the create.DN function. NOTE THAT THIS FUNCTION IS NOT DESIGNED FOR USE OUTSIDE OF THE create.DN FUNCTION.
Beavers, T. E., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., & Teigler, J. E. (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure. Journal of Computational and Graphical Statistics, 1-21.
Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.