calculate_features computes several features associated with a
categorical time series or between a categorical and a real-valued time series
calculate_features(series, n_series = NULL, lag = 1, type = NULL)The corresponding feature.
An object of type tsibble (see R package tsibble), whose column named Value
contains the values of the corresponding CTS. This column must be of class factor and its levels
must be determined by the range of the CTS.
A real-valued time series.
The considered lag (default is 1).
String indicating the feature one wishes to compute.
Ángel López-Oriona, José A. Vilar
Assume we have a CTS of length \(T\) with range \(\mathcal{V}=\{1, 2, \ldots, r\}\),
\(\overline{X}_t=\{\overline{X}_1,\ldots, \overline{X}_T\}\), with \(\widehat{p}_i\)
being the natural estimate of the marginal probability of the \(i\)th
category, and \(\widehat{p}_{ij}(l)\) being the natural estimate of the joint probability
for categories \(i\) and \(j\) at lag l, \(i,j=1, \ldots, r\). Assume also that
we have a real-valued time series of length \(T\), \(\overline{Z}_t=\{\overline{Z}_1,\ldots, \overline{Z}_T\}\).
The function computes the following quantities depending on the argument
type:
If type=gini_index, the function computes the
estimated gini index, \(\widehat{g}=\frac{r}{r-1}(1-\sum_{i=1}^{r}\widehat{p}_i^2)\).
If type=entropy, the function computes the
estimated entropy, \(\widehat{e}=\frac{-1}{\ln(r)}\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i\).
If type=chebycheff_dispersion, the function computes the
estimated chebycheff dispersion, \(\widehat{c}=\frac{r}{r-1}(1-\max_i\widehat{p}_i)\).
If type=gk_tau, the function computes the
estimated Goodman and Kruskal's tau, \(\widehat{\tau}(l)=\frac{\sum_{i,j=1}^{r}\frac{\widehat{p}_{ij}(l)^2}{\widehat{p}_j}-\sum_{i=1}^r\widehat{p}_i^2}{1-\sum_{i=1}^r\widehat{p}_i^2}\).
If type=gk_lambda, the function computes the
estimated Goodman and Kruskal's lambda, \(\widehat{\lambda}(l)=\frac{\sum_{j=1}^{r}\max_i\widehat{p}_{ij}(l)-\max_i\widehat{p}_i}{1-\max_i\widehat{p}_i}\).
If type=uncertainty_coefficient, the function computes the
estimated uncertainty coefficient, \(\widehat{u}(l)=-\frac{\sum_{i, j=1}^{r}\widehat{p}_{ij}(l)\ln\big(\frac{\widehat{p}_{ij}(l)}{\widehat{p}_i\widehat{p}_j}\big)}{\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i}\).
If type=pearson_measure, the function computes the
estimated Pearson measure, \(\widehat{X}_T^2(l)=T\sum_{i,j=1}^{r}\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}\).
If type=phi2_measure, the function computes the
estimated Phi2 measure, \(\widehat{\Phi}^2(l)=\frac{\widehat{X}_T^2(l)}{T}\).
If type=sakoda_measure, the function computes the
estimated Sakoda measure, \(\widehat{p}^*(l)=\sqrt{\frac{r\widehat{\Phi}^2(l)}{(r-1)(1+\widehat{\Phi}^2(l))}}\).
If type=cramers_vi, the function computes the
estimated Cramer's vi, \(\widehat{v}(l)=\sqrt{\frac{1}{r-1}\sum_{i,j=1}^r\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}}\).
If type=cohens_kappa, the function computes the
estimated Cohen's kappa, \(\widehat{\kappa}(l)=\frac{\sum_{j=1}^{r}(\widehat{p}_{jj}(l)-\widehat{p}_j^2)}{1-\sum_{i=1}^r\widehat{p}_i^2}\).
If type=total_correlation, the function computes the
the estimated sum \(\widehat{\Psi}(l)=\frac{1}{r^2}\sum_{i,j=1}^{r}\widehat{\psi}_{ij}(l)^2\),
where \(\widehat{\psi}_{ij}(l)\) is the estimated correlation
\(\widehat{Corr}(Y_{t, i}, Y_{t-l, j})\), \(i,j=1,\ldots,r\), being \(\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}\),
with \(\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top\), the
binarized time series of \(\overline{X}_t\).
If type=spectral_envelope, the function computes the
estimated spectral envelope.
If type=total_mixed_correlation_1, the function computes the
estimated total mixed l-correlation given by
$$\widehat{\Psi}_1(l)=\frac{1}{r}\sum_{i=1}^{r}\widehat{\psi}_{i}(l)^2,$$ where
\(\widehat{\psi}_{i}(l)=\widehat{Corr}(Y_{t,i}, Z_{t-l})\), being \(\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}\),
with \(\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top\), the
binarized time series of \(\overline{X}_t\).
If type=total_mixed_correlation_2, the function computes the
estimated total mixed q-correlation given by
$$\widehat{\Psi}_2(l)=\frac{1}{r}\sum_{i=1}^{r}\int_{0}^{1}\widehat{\psi}^\rho_{i}(l)^2d\rho,$$ where
\(\widehat{\psi}_{i}^\rho(l)=\widehat{Corr}\big(Y_{t,i}, I(Z_{t-l}\leq q_{Z_t}(\rho)) \big)\), being \(\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}\),
with \(\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top\), the
binarized time series of \(\overline{X}_t\), \(\rho \in (0, 1)\) a probability
level, \(I(\cdot)\) the indicator function and \(q_{Z_t}\) the quantile
function of the corresponding real-valued process.
weiss2008measuringctsfeatures
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),]
uc <- calculate_features(series = sequence_1, type = 'uncertainty_coefficient' )
# Computing the uncertainty coefficient
# for the first series in dataset GeneticSequences
se <- calculate_features(series = sequence_1, type = 'spectral_envelope' )
# Computing the spectral envelope
# for the first series in dataset GeneticSequences
Run the code above in your browser using DataLab