Bootstrap resampling approach to estimate the confidence intervals for the cluster prototypes.
bootstrap(data, k, H, mtimes = 50, lr = 0.01, ncore = 2)
W.est
The \(W\) matrix estimated by bootstrap.
lower
Lower bound of confidence intervals.
upper
Upper bound of confidence intervals.
Data matrix or data frame.
The number of prototypes/clusters.
Matrix, input \(H\) matrix to start the algorithm. Usually the \(H\) matrix is the output of the function ssmf( ). If \(H\) is not supplied, the bootstrapped \(W\) matrix might have different prototype orders from the outputs of the function ssmf( ).
Integer, number of bootstrap samples. Default number is 50.
Optimisation learning rate in ssmf().
The number of cores to use for parallel execution.
Wenxuan Liu
Create bootstrap samples of size \(n\) by sampling from the data set with replacement and repeat the steps \(M\) times. The \(m^{th}\) bootstrap sample is denoted as $$X^{{\ast}(m)}=(x_1^{{\ast}(m)}, x_2^{{\ast}(m)},\ldots,x_n^{{\ast}(m)}),$$
where each \(x_i^{{\ast}(m)}\) is a random sample (with replacement) from the data set.
Then, apply the SSMF algorithm to each bootstrap sample and calculate the \(m^{th}\) bootstrap replicate of the prototypes matrix, which is denoted as \(W^{{\ast}(m)}\).
The estimate standard deviation of \(M\) bootstrap replicates can be calculated by
$$sd(W^{\ast}) =\sqrt {\frac{1}{M-1} \sum_{m=1}^{M} [W^{{\ast}(m)}-\overline{W}^{\ast}]^2 },$$
where \(\overline{W}^{\ast}=\frac{1}{M} \sum_{m=1}^{M} W^{{\ast}(m)}\). Therefore, the 95% CIs for the prototypes can be calculated by
$$(\overline{W}^{\ast}-t_{(0.025, M-1)} \cdot sd(W^{\ast}),\ \overline{W}^{\ast}+t_{(0.975, M-1)} \cdot sd(W^)),$$ where \(t_{(0.025, n-1)}\) and \(t_{(0.975, n-1)}\) is the quantiles of student \(t\) distribution with 95% significance and \((M-1)\) degrees of freedom.
Stine, R. (1989). An Introduction to Bootstrap Methods: Examples and Ideas. Sociological Methods & Research, 18(2-3), 243-291. <doi:10.1177/0049124189018002003>
# example code
# \donttest{
data <- SimulatedDataset
k <- 4
fit <- ssmf(data = data, k = k)
bootstrap(data = data , k = k, H = fit$H)
# }
Run the code above in your browser using DataLab