Note, that due to the interactive nature of the application, the reactive graphs can become
rather slow in updating. We hence suggest breaking long-time series into smaller chunks
that do not strain the available memory too much. Trial and error is useful here, but we
generally suggest working on a maximum of up to one year at a time.
Once the application is launched,
the user can load an .RData
file where a data.frame
with a imestamp and sensor data (multiple sensor columns are supported).
The timestamp in this data.frame
should be of class POSIXct
.
Users can select the x and y axes of the interactive time series plots.
In addition, the user can provide the units of the imported data
(e.g., degrees \(C\) or \(mV\) for \(\Delta T\) or \(\Delta V\), respectively).
A parameter (alpha) for automatic outlier detection can be supplied.
More specifically, the automatic identification of outliers is based on a
two-step procedure:
i) the Tukey<U+2019>s method (Tukey, 1977) is applied to detect statistical outliers
as values falling outside the range
\([q_{0.25} - alpha * IQR, q_{0.75} + alpha * IQR]\),
where \(IQR\) is the interquartile range
(\(q_{0.75} - q_{0.25}\))
with \(q_{0.25}\) denoting the 25% lower quartile and \(q_{0.75}\) the
75% upper quartile, and alpha is a user-defined parameter
(default value alpha = 3
;
although visual inspection through the interactive plots allows for adjusting
alpha and optimizing the automatic detection of outliers),
and ii) the lag-1 differences of the raw data are calculated
and data points with lag-1 differences greater
than the mean of the raw input time series, are excluded.
The raw input data from the provided .RData
file are depicted with
black points in the first plot titled <U+2018>Raw and automatic detection<U+2019>
while the automatically detected outliers are also highlighted in this plot in red.
The user can adjust the parameter alpha
and visually inspect the
automatically detected outliers in order to achieve the optimal automatic outlier selection.
This plot allows also interactivity (by hovering the mouse in the upper right corner
the available interactive tools appear, e.g., zoom in/out).
Also, the lower subpanel of this plot provides a better overview of the temporal extent
of the data and allows the user to select narrower time window for a more thorough data inspection.
Once the user is satisfied with the automatically selected data points,
one can proceed to the manual outlier selection.
The second interactive plot (titled <U+2018>Filtered and manual selection<U+2019>)
presents the raw data after removing the automatically detected outliers of the previous step,
and allows the user to manually select (point, rectangular, and lasso selections are allowed)
data points. The first selection identifies points to be removed (outliers),
and their color changes to red. If a point is selected for a second time,
this will undo its classification as outlier and its color is set back to black (i.e., not an outlier).
The red-color data points correspond to the selected outliers to be removed from the data,
in addition to those identified in the automated detection.