Learn R Programming

imputation (version 2.0.3)

tsImpute: Time Series Imputation

Description

Time Series Imputation using Boosted Trees Fill each column by treating it as a regression problem. For each column i, use boosted regression trees to predict i using all other columns except i. If the predictor variables also contain missing data, the gbm function will itself use surrogate variables as substitutes for the predictors. This imputation function can handle both categorical and numeric data.

Usage

tsImpute(time, dimension, metric, max.iters = 2, cv.fold = 2, n.trees = 100, verbose = T, ...)

Arguments

time
a vector of dates or datetime objects
dimension
a data frame of exogenous predictor variables
metric
a matrix where each column represents a time series
max.iters
number of times to iterate through the columns and impute each column with fitted values from a regression tree
cv.fold
number of folds that gbm should use internally for cross validation
n.trees
the number of trees used in gradient boosting machines
verbose
if TRUE print status updates
...
additional params passed to gbm

Examples

Run this code
dates = timeSequence(from = '2012-01-01', to = '2012-12-31', by = 'day')
  dimensions = sample(c("A", "B"), 366, replace = TRUE)
  numA = length(which(dimensions == "A")); numB = length(which(dimensions == "B"))
  metrics = matrix(0, 366, 2)
  metrics[which(dimensions == "A"),1] = rnorm(numA, mean=1)
  metrics[which(dimensions == "A"),2] = rnorm(numA, mean=5)
  metrics[which(dimensions == "B"),1] = rnorm(numB, mean=-10)
  metrics[which(dimensions == "B"),2] = rnorm(numB, mean=-5)
  tp = projectDate(as.Date(dates))
  monday.indices = which(tp$weekday == "Monday")
  metrics[sample(monday.indices, 20),] = NA
  tsImpute(as.Date(dates), dimensions, metrics)

Run the code above in your browser using DataLab