A Bayesian model averaging algorithm called BEAST to decompose time series or 1D sequential data into individual components, such as abrupt changes, trends, and periodic/seasonal variations. BEAST is useful for changepoint detection (e.g., breakpoints or structural breaks), nonlinear trend analysis, time series decomposition, and time series segmentation.
beast.irreg(
y,
time, deltat,
freq = NA,
season = c('harmonic','dummy','none'),
scp.minmax = c(0,10), sorder.minmax = c(0,5), sseg.min = NULL,
tcp.minmax = c(0,10), torder.minmax = c(0,1), tseg.min = NULL,
detrend = FALSE,
deseasonalize= FALSE,
mcmc.seed = 0,
mcmc.burnin = 200,
mcmc.chains = 3,
mcmc.thin = 5,
mcmc.samples = 8000,
print.options = TRUE,
print.progress = TRUE,
gui = FALSE,
...
)
a vector for an irregular or unordered time series. Missing values such as NA and NaN are allowed.
a vector of the same length as y
; the time vector of datapoints of y
. It can be a numeric vector or a vector of Dates. If your time is formated as strings or vectors of year, months, and days, use beast123()
instead where more formats are supported.
numeric; a user-specified time interval to aggregate y
based on time
into a regular time series. The BEAST model is currently formulated for regular data only, so internally, the beast.irreg
function will aggregate/re-bin irregular data into regular ones; for the aggregation, the deltat
MUST be provided to specify the desired bin size or time interval. The unit of delta needs to be consisent with time
. If time
takes a numeric vector, the unit is arbitrary and irrelevant to beast. If time
takes a vector of Dates, the unit is assumed to YEAR; for example, if the desired time interval is 1 month (or 1 day), deltat should be 1/12 (or 1/365).
numeric. Needed only for data with a periodic/cyclic component (i.e., season='harmonic'
or 'dummy'
) and ignored for trend-only data (i.e., season='none'
). It specifies the number of samples/datapoints per cycle (e.g, a time series of monthly observations with an annual period has a frequency of 12); it may be a decimal real number (e.g., a time series of bi-hourly observations with a period of 37.5 hrs has a freq of 37.5/2=18.75). The period of the cyclic component in the unit of time is period=deltat*freq
. freq
is not a model parameter of BEAST and it has to be specified by the user. If freq
is missing, BEAST first attempts to guess its value via auto-correlation before fitting the model, but in this case, freq
is assumed to be an integer.
characters (default to 'harmonic'); speficy if y
has a periodic component or not. Three strings are possible.
'none'
: y
is trend-only; no periodic components are present in the time series. The args for the seasonal component (i.e.,sorder.minmax, scp.minmax and sseg.max) will be ignored.
'harmonic'
: y
has a periodic/seasonal component. The term 'season' is a misnomer, being used here to broadly refer to any periodic variations present in y
. The periodicity is NOT a model parameter estimated by beast but a known constant given by the user through freq
. By default, the periodic component is modeled as a harmonic curve--a combination of sins and cosines.
If 'dummy'
, the same as 'harmonic'
except that the periodic/seasonal component is modeled as a non-parametric curve. The arg sorder.minmax
is irrelevant and is ignored.
a vector of 2 integers (>=1); the min and max harmonic orders considered to fit the seasonal component. sorder.minmax
is used only used if the time series has a seasonal component (i.e., season='harmonic'
) and ignored for trend-only data or when season='dummy'
. If the min and max orders are equal (sorder.minmax[1]=sorder.minmax[2]
), BEAST assumes a constant harmonic order used and won't infer the posterior probability of harmonic orders.
a vector of 2 integers (>=0); the min and max number of seasonal changepints (scp) allowed in segmenting the seasonal component. scp.minmax
is used only if y
has a seasonal component (i.e., season='harmonic' or 'dummy'
) and ignored for trend-only data. If the min and max changepoint numbers are equal, BEAST assumes a constant number of scp and won't infer the posterior probability of the number of changepoints, but it still estimates the occurance probability of the changepoints over time (i.e., the most likely times at which these changepoints occur). If both the min and max numbers are set to 0, no changepoints are allowed; then a global harmonic model is used to fit the seasonal component, but still, the most likley harmonic order will be inferred if sorder.minmax[1] is not equal to sorder.minmax[2].
an integer; the min segment length allowed between two neighboring season changepoints. That is, when fitting a piecewise harmonic seasonal model, no two changepoints are allowed to occur within a time window of length sseg.min
. sseg.min
must be an unitless integer--the number of time intervals/data points so that the time window in the original unit is sseg.min*deltat
. sseg.min
defaults to NULL and its value will be given a default value in reference to freq
.
a vector of 2 integers (>=0); the min and max orders of the polynomials considered to fit the trend component. The 0-th order corresponds to a constant term/a flat line and the 1st order is a line. If torder.minmax[1]=torder.minmax[2]
, BEAST assumes a constant polynomial order used and won't infer the posterior probability of polynomial orders.
a vector of 2 integers; the min and max number of trend changepints (tcp) allowed in segmenting the trend component. If the min and max changepoint numbers are equal, BEAST assumes a constant number of changepoints and won't infer the posterior probability of the number of changepoints for the trend, but it still estimates the occurance probability of the changepoints over time (i.e., the most likely times at which these changepoints occur in the trend). If both the min and max numbers are set to 0, no changepoints are allowed; then a global polynomial trend is used to fit the trend component, but still, the most likley polynomial order will be inferred if torder.minmax[1] is not equal to torder.minmax[2].
an integer; the min segment length allowed between two neighboring trend changepoints. That is, when fitting a piecewise polynomial trend model, no two changepoints are allowed to occur within a time window of length tseg.min. tseg.min
must be an unitless integer--the number of time intervals/data points so that the time window in the original unit is tseg.min*deltat
. tseg.min
defaults to NULL and its value will be given a default value in reference to freq
if the time series has a cyclic component.
logical; If TRUE
, a global trend is first fitted and removed from the time series before running BEAST; after BEAST finishes, the global trend is added back to the BEAST result.
logical; If TRUE
, a global seasonal model is first fitted and removed from the time series before running BEAST; after BEAST finishes, the global seasonal curve is added back to the BEAST result. deseasonalize
is ignored if season='none'
(i.e., trend-only data).
integer (>=0); the seed for the random number generator used for Monte Carlo Markove Chain (mcmc). If mcmc.seed=0
, an arbitrary seed is picked and the fitting results vary across runs. If fixed to the same non-zero integer, the result can be re-produced for different runs. But the results from the same seed may still vary if run on different computers because the random generator libray depends on CPU's instruction sets.
integer (>0); the number of MCMC chains.
integer (>0); a factor to thin chains (e.g., if thinningFactor=5, samples will be taken every 3 iterations)
integer (>0); the number of burn-in samples discarded at the start of each chain
integer (>=0); the number of samples collected per MCMC chain. The total number of iterations is (burnin+samples*thin)*chains
.
boolean. If TRUE
,the full list of input parameters to BEAST will be printed out prior to the MCMC inference; the naming for this list (e.g., metadata, prior, and mcmc) differs slightly from the input to beast
, but there is a one-to-one correspondence (e.g., prior$trendMinSepDist=tseg.min). Internaly, beast convers the input parameters to the forms of metadata, prior,and mcmc. Type 'View(beast)' to see the details or check the beast123
function.
boolean;If TRUE
, a progressbar will be displayed.
boolean. If TRUE
, BEAST will be run in a GUI demostration mode, with a GUI window to show an animation of the MCMC sampling in the model space step by step. Note that "gui=TRUE
" works only for Windows x64 systems not Windows 32 or Linux/Mac systems.
additional parameters. There are many more settings for the implemenation but not made available in the beast() interface; please use the function beast123()
instead
The output is an object of class "beast". It is a list, consisting of the following variables. In the explantions below, we assume the input y
is a single time series of length N
:
a vector of size 1xN
: the times at the N
sampled locatons. By default, it is simply set to 1:N
a vector, matrix, or 3D array; this is a copy of the input data
if extra$dumpInputData = TRUE. If extra$dumpInputData=FALSE, it is set to NULL. If the orignal input data
is irregular, the copy here is the regular version aggragted from the original at the time interval specified by metadata$deltaTime.
numeric; the average of the model marginal likhood; the larger marg_lik, the better the fitting for a given time series.
numeric; the R-square of the model fiting.
numeric; the RMSE of the model fiting.
numeric; the estimated variance of the model error.
a list object numeric consisting of various outputs related to the estimated trend component:
ncp
: [Number of ChangePoints]. a numeric scalar; the mean number of trend changepoints.
ncpPr
: [Probability of the Number of ChangePoints]. A vector of length (tcp.minmax[2]+1)=tcp.max+1
. It gives a probability distribution of having a certain number of trend changepoints over the range of [0,tcp.max]; for example, ncpPr[1]
is the probability of having no trend changepoint; ncpPr[i]
is the probability of having (i-1) changepoints: Note that it is ncpPr[i]
not ncpPr[i-1]
because ncpPr[1] is used for having zero changepoint.
cpOccPr
: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the trend at each point of time. Plotting cpOccPr
will depict a continous curve of probability-of-being-changepoint. Of particular note, in the curve, a higher peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time and does not neccessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0)
(i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window c(0.1,0.2,0.21,0.2,0.1)
(i.e., the peak prob is 0.21 but the summed prob is 0.71).
order
: a vector of length N; the average polynomial order needed to approximate the fitted trend. As an average over many sampled individual piece-wise polynomial trends, order
is not neccessarly an integer.
cp
: [Changepoints] a vector of length tcp.max=tcp.minmax[2]
; the most possible changepoint locations in the trend component. The locations are obtained by first applying a sum-filtering to the cpOccPr
curve with a filter window size of tseg.min and then picking up to a total ncp
of the highest peaks in the filterd curve. If ncp<tcp.max
, the remaining of the vector is filled with NaNs.
cpPr
: [Changepoints PRobability] a vector of length tcp.max=tcp.minmax[2]
; the probabilities associated with the changepoints cp
. Filled with NaNs for the remaning elements if ncp<tcp.max
.
cpCI
: [Changepoints Credible Interval] a matrix of dimension tcp.max x 2
; the credibable intervals for the detected changepoints cp
.
cpAbruptChange
: [Abrupt change at Changepoints] a vector of length tcp.max
; the jumps in the fitted trend curves at the detected changepoints cp
.
Y
: a vector of length N; the estimated trend component. It is the Bayesian model averaging of all the individual sampled trend.
SD
: [Standard Deviation] a vector of length N; the estimated standard deviation of the estimated trend component.
CI
: [Standard Deviation] a matrix of dimension N x 2
; the estimated credible interval of the estimated trend. One vector of the matrix is for the upper envelope and another for the lower envelope.
slp
: [Slope] a vector of length N; the time-varying slope of the fitted trend component .
slpSD
: [Standar Deviation of Slope] a vector of length N; the SD of the slope for the trend component.
slpSignPr
: [PRobability of slope having a postive sign] a vector of length N; the probabity of the slope being positive (i.e., increasing trend) for the trend component. For example, if slpSignPr=0.80 at a given point in time, it means that 80
pos_ncp
:
neg_ncp
:
pos_ncpPr
:
neg_ncpPr
:
pos_cpOccPr
:
neg_cpOccPr
:
pos_cp
:
neg_cp
:
pos_cpPr
:
neg_cpPr
:
pos_cpAbruptChange
:
neg_cpAbruptChange
:
pos_cpCI
:
neg_cpCI
: The above variables have the same outputs as those variables without the prefix 'pos' and 'neg', except that we differentiate the changepoints with a POStive jump in the trend from those changepoints with a NEGative jump. For example, pos_ncp
refers to the average number of trend changepoints that jump up (i.e., postively) in the trend.
inc_ncp
:
dec_ncp
:
inc_ncpPr
:
dec_ncpPr
:
inc_cpOccPr
:
dec_cpOccPr
:
inc_cp
:
dec_cp
:
inc_cpPr
:
dec_cpPr
:
inc_cpAbruptChange
:
dec_cpAbruptChange
:
inc_cpCI
:
dec_cpCI
: The above variables have the same outputs as those variables without the prefix 'inc' and 'dec', except that we differentiate the changepoints at which the trend slope increases from those changepoints at which the trend slope decreaes. For example, if the trend slopes before and after a chngpt is 0.4 and 2.5, then the changepont is counted toward inc_ncp
.
a list object numeric consisting of various outputs related to the estimated seasonal/periodic component:
ncp
: [Number of ChangePoints]. a numeric scalar; the mean number of seasonal changepoints.
ncpPr
: [Probability of the Number of ChangePoints]. A vector of length (scp.minmax[2]+1)=scp.max+1
. It gives a probability distribution of having a certain number of seasonal changepoints over the range of [0,scp.max]; for example, ncpPr[1]
is the probability of having no seasonal changepoint; ncpPr[i]
is the probability of having (i-1) changepoints: Note that the index is i rather than (i-1) because ncpPr[1] is used for having zero changepoint.
cpOccPr
: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the seasonal component at each point of time. Plotting cpOccPr
will depict a continous curve of probability-of-being-changepoint over the time. Of particular note, in the curve, a higher value at a peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time, and does not neccessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0)
(i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window values c(0.1,0.2,0.3,0.2,0.1)
(i.e., the peak prob is 0.3 but the summed prob is 0.8).
order
: a vector of length N; the average harmonic order needed to approximate the seasonal component. As an average over many sampled individual piece-wise harmonic curves, order
is not neccessarly an integer.
cp
: [Changepoints] a vector of length scp.max=scp.minmax[2]
; the most possible changepoint locations in the seasonal component. The locations are obtained by first applying a sum-filtering to the cpOccPr
curve with a filter window size of sseg.min and then picking up to a total ncp
of the highest peaks in the filterd curve. If ncp<scp.max
, the remaining of the vector is filled with NaNs.
cpPr
: [Changepoints PRobability] a vector of length scp.max
; the probabilities associated with the changepoints cp
. Filled with NaNs for the remaning elements if ncp<scp.max
.
cpCI
: [Changepoints Credible Interval] a matrix of dimension scp.max x 2
; the credibable intervals for the detected changepoints cp
.
cpAbruptChange
: [Abrupt change at Changepoints] a vector of length scp.max
; the jumps in the fitted trend curves at the detected changepoints cp
.
Y
: a vector of length N; the estimated trend component. It is the Bayesian model averaging of all the individual sampled trend.
SD
: [Standard Deviation] a vector of length N; the estimated standard deviation of the estimated trend component.
CI
: [Standard Deviation] a matrix of dimension N x 2
; the estimated credible interval of the estimated trend. One vector of the matrix is for the upper envelope and another for the lower envelope.
amp
: [AMPlitude] a vector of length N; the time-varying amplitude of the estimated seasonality.
ampSD
: [Standar Deviation of AMPlitude] a vector of length N; , the SD of the amplitude of the seasonality.
pos_ncp
:
neg_ncp
:
pos_ncpPr
:
neg_ncpPr
:
pos_cpOccPr
:
neg_cpOccPr
:
pos_cp
:
neg_cp
:
pos_cpPr
:
neg_cpPr
:
pos_cpAbruptChange
:
neg_cpAbruptChange
:
pos_cpCI
:
neg_cpCI
: The above variables have the same outputs as those variables without the prefix 'pos' and 'neg', except that we differentiate the changepoints with a POStive jump in the trend from those changepoints with a NEGative jump. For example, pos_ncp
refers to the average number of trend changepoints that jump up (i.e., postively) in the trend.
Zhao, K., Wulder, M.A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X. and Brown, M., 2019. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, p.111181 (the beast algorithm paper).
Zhao, K., Valle, D., Popescu, S., Zhang, X. and Mallick, B., 2013. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, pp.102-119 (the Bayesian MCMC scheme used in beast).
Hu, T., Toman, E.M., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K. and Feng, Y., 2021. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, pp.250-261(a beast application paper).
# NOT RUN {
library(Rbeast)
# }
# NOT RUN {
######################################################################################
# 'ohio' is a data.frame on an irregular Landsat time series of reflectances
# ndvi (e.g., surface greenness) at an Ohio site. It has multiple columns
# of alternative time/date formats, such as year, month, day, doy (date of year),
# rdate (R's date class), and time (fractional year)
data(ohio)
str(ohio)
######################################################################################
# beast.irreg accepts irregular time series and aggregates them into regular ones prior
# to model fitting. For the aggregation, both "time" and "deltat" are needed to specify
# indvidial times of data points and the the regular time interval desired. If there is
# a cyclic componet, freq in the unit of time intervals/datapoints should also be given;
# if not, a possible value is guessed via auto-correlation
######################################################################################
# Below, 'time' is given as numeric values, which can be of any arbitray unit. Although
# here 1/12 can be interepreted as 1/12 year or 1 month, beast itself doesn't care about
# the time unit. So, the unit of 1/12 is irrelevant for BEAST. 'freq' is missing and a
# guess of it is used.
o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/12)
plot(o)
print(o)
######################################################################################
# Explictly provide 'freq'
o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/12, freq=12)
######################################################################################
# Aggregate the time series at a half-monthly time interval
o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/24, freq=24)
######################################################################################
# 'time' is given as R's dates. The unit is YEAR. 1/12 refers to 1/12 year or 1 month
o=beast.irreg(ohio$ndvi, time=ohio$rdate,deltat=1/12)
######################################################################################
# If the input time has formats (e.g., date strings) other than the above two, use the
# beast123() function instead.
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab