This function calculates an extended version of BIC, which is computed using a particular weighted average of the total residual sum of squares and the number of clusters.
SCEM uses the following equation for the BIC of each partition:
BIC(P) = (np) RSS(P)np + |P|(B_n^-1-1) (nB_n),ASCII representation
where RSS(P) = _q=1^Q RSS(S_q)ASCII representation.
The sample size of each individual time series (i.e. the number of observations) is denoted by
In order to have a reasonable representative value for the sample size, we have chosen to use the natural arithmetic mean n=(n_1+…+n_p)/pASCII representation.
(B_n^-1-1)(nB_n)ASCII representation is the tuning parameter that places the penalty on the number of clusters (also note that the term nB_nASCII representation). Using a different tuning parameter _nASCII representation in place of (B_n^-1-1)(nB_n)ASCII representation allows stronger or weaker penalties on the number of clusters.
EBIC(paths, partition, bandwidth)
A list of data frames, where each frame contains the data for one individual. Every data frame should have two columns with names 'distance' and 'oxygen'.
A list of vectors. Each element in the list is a vector of integers, corresponding to individuals considered in one group.
Denotes the order of the bandwidth that should be used in the estimation process. bandwidth = k will mean that the bandwidth is n^k.
Value of the extended BIC function for the partition.
# NOT RUN {
armenia_split = split(armenia,f = armenia$ID)
band = -0.33
p = length(armenia_split)
EBIC(armenia_split,1:p,band)
# }
Run the code above in your browser using DataLab