This function creates the centers of data nuggets from a random sample.
create.DNcenters(RS,
delete.percent = 0.1,
DN.num,
dist.metric = "euclidean",
make.pbs = FALSE)
DN.num by ncol(RS) data frame containing the data nugget centers.
A data matrix (data frame, data table, matrix, etc) containing only entries of class numeric.
The proportion of observations to remove from the data matrix at each iteration when finding data nugget centers. Must be of class numeric and within (0,1). Default value is 0.1.
The number of data nuggets to create. Must be of class numeric.
The distance metric used to create the initial centers of data nuggets. Must be 'euclidean' or 'manhattan'. Default is "euclidean".
Logical; whether to show a progress bar while the function runs. Default is FALSE.
Rituparna Dey, Traymon Beavers, Javier Cabrera, Mariusz Lubomirski
This function is used for reducing a random sample to data nugget centers in the create.DN function. NOTE THAT THIS FUNCTION IS NOT DESIGNED FOR USE OUTSIDE OF THE create.DN FUNCTION.
Beavers, T. E., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., & Teigler, J. E. (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure. Journal of Computational and Graphical Statistics, 1-21.
Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.