The \(k\)-fold cross-validation randomly partitions data into \(k\)
subsets with equal (or close to equal) sizes. \(k - 1\) subsets are used as
the training data set to create a tree with a desired number of leaves and
the other subset is used as validation data set to evaluate the predictive
performance of the trained tree. The process repeats for each subset as the
validating set (\(m = 1, \ldots, k\)) and the mean squared difference,
$$MSE_m=\frac{1}{n_m} \sum_{q=1}^Q\sum_{i \in m} d^2_{euc}(y_{iq},
\hat{y}_{(-i)q}),$$
is calculated, where \(\hat{y}_{(-i)q}\) is the cluster mean on the
variable
\(q\) of the cluster created by the training data where the observed value,
\(y_{iq}\), of the validation data set will fall into, and
\(d^2_{euc}(y_{iq}, \hat{y}_{(-i)q})\) is the squared Euclidean distance
(dissimilarity) between two observations at variable $q$. This process is
repeated for the $k$ subsets of the data set and the average of these test
errors is the cross-validation-based estimate of the mean squared error of
predicting a new observation,
$$CV_K = \overline{MSE} = \frac{1}{M} \sum_{m=1}^M MSE_m.$$