msm: Performing Monte Carlo Simulations of Markov Chain

Description

This is the main function to perform Monte Carlo simulations of Markov Chain on the dynamic forecasting of HVT States of a time series dataset. It includes both ex-post and ex-ante analysis offering valuable insights into future trends while resolving state transition challenges through clustering and nearest-neighbor methods to enhance simulation accuracy.

Usage

msm(
  state_time_data,
  forecast_type = "ex-post",
  initial_state,
  n_ahead_ante,
  transition_probability_matrix,
  num_simulations = 100,
  trainHVT_results,
  scoreHVT_results,
  actual_data = NULL,
  raw_dataset,
  k = 5,
  handle_problematic_states = FALSE,
  n_nearest_neighbor = 1,
  show_simulation = TRUE,
  mae_metric = "median",
  time_column = NULL,
  plot_type = "static"
)

Value

A list object that contains the forecasting plots and MAE values.

[[1]]: Simulation plots and MAE values for state and centroids plot
[[2]]: Summary Table, Dendogram plot and Clustered Heatmap when handle_problematic_states is TRUE

Arguments

state_time_data: DataFrame. A dataframe containing state transitions over time(cell id and timestamp)
forecast_type: Character. A character to indicate the type of forecasting. Accepted values are "ex-post" or "ex-ante".
initial_state: Numeric. An integer indicatiog the state at t0.
n_ahead_ante: Numeric. A vector of n ahead points to be predicted further in ex-ante analyzes.
transition_probability_matrix: DataFrame. A dataframe of transition probabilities/ output of `getTransitionProbability` function
num_simulations: Integer. A number indicating the total number of simulations to run. Default is 100.
trainHVT_results: List.`trainHVT` function output
scoreHVT_results: List. `scoreHVT` function output
actual_data: Dataframe. A dataFrame for ex-post prediction period with teh actual raw data values
raw_dataset: DataFrame. A dataframe of input raw dataset from the mean and standard deviation will be calculated to scale up the predicted values
k: Integer. A number of optimal clusters when handling problematic states. Default is 5.
handle_problematic_states: Logical. To indicate whether to handle problematic states or not. Default is FALSE.
n_nearest_neighbor: Integer. A number of nearest neighbors to consider when handling problematic states. Default is 1.
show_simulation: Logical. To indicate whether to show the simulation lines in plots or not. Default is TRUE.
mae_metric: Character. A character to indicate which metric to calculate Mean Absolute Error. Accepted entries are "mean", "median", or "mode". Default is "median".
time_column: Character. The name of the column containing time data. Used for aligning and plotting the results.
plot_type: Character. A character to indicate what type of plot should be generated. Accepred entries are "static" (ggplot object) or "interactive"(plotly object). Default is "static".

Author

Vishwavani <vishwavani@mu-sigma.com>

Examples

Run this code

dataset <- data.frame(t = as.numeric(time(EuStockMarkets)),
DAX = EuStockMarkets[, "DAX"],
SMI = EuStockMarkets[, "SMI"],
CAC = EuStockMarkets[, "CAC"],
FTSE = EuStockMarkets[, "FTSE"])
hvt.results<- trainHVT(dataset[,-1],n_cells = 60, depth = 1, quant.err = 0.1,
                      distance_metric = "L1_Norm", error_metric = "max",
                      normalize = TRUE,quant_method = "kmeans")
scoring <- scoreHVT(dataset, hvt.results)
cell_id <- scoring$scoredPredictedData$Cell.ID
time_stamp <- dataset$t
temporal_data <- data.frame(cell_id, time_stamp)
table <- getTransitionProbability(temporal_data, 
cellid_column = "cell_id",time_column = "time_stamp")
colnames(temporal_data) <- c("Cell.ID","t")
ex_post_forecasting <- dataset[1800:1860,]
ex_post <- msm(state_time_data = temporal_data,
              forecast_type = "ex-post",
              transition_probability_matrix = table,
              initial_state = 2,
              num_simulations = 100,
              scoreHVT_results = scoring,
              trainHVT_results = hvt.results,
              actual_data = ex_post_forecasting,
              raw_dataset = dataset,
              mae_metric = "median",
             show_simulation = FALSE,
             time_column = 't')

Run the code above in your browser using DataLab