MSPE: Compute normalized mean squared prediction error based on accuracy to impute missing data values

Description

Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.

Usage

MSPE(object, y_true, pos)

Arguments

object

A gpdpgrow or gmrfdpgrow object.

y_true

An N x T numeric matrix of test set values.

pos

An N x T matrix with all entries either 0 or {1}, where a 1 indexes a missing entry or test point in y_true.

Value

A list object containing various MSPE fit statistics that measure the accuracy of of predicting the values in y_true indexed by pos.
SSESum of squared errors based on full N x T, y_true - y_hat.
MSEMean squared error computed from SSE.
RMSESquare root of MSE.
SSPESum of squared prediction error based on missing values.
MSPEMean squared prediction error based on missing values.
nMSPEMean squared prediction error based on missing values that is normalized by the variance of the test observations to produce a value in [0,1].
RMSPESquare root of MSPE.

Examples

Run this code

{
library(growfunctions)

## load the monthly employment count data for a collection of
## U.S. states from the Current
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with
## the years 2009 - 2013
## to examine the state level employment
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2009:2013))]

## dimensions
T         <- ncol(y_short)  ## time points per unit
N         <- nrow(y_short)  ## number of units

## Demonstrate estimation of intermittent missing data
## from posterior predictive distribution by randomly
## selecting 10 percent of entries in y_short and
## setting them to NA.

## randomly assign missing positions in y.
## assume every unit has equal number of
## missing positions
## randomly select number of missing
## observations for each unit
m_factor  <- .1
M         <- floor(m_factor*N*T)
m_vec     <- rep(floor(M/N),N)
if( sum(m_vec) < M )
{
    m_left              <- M - sum(m_vec)
    pos_i               <- sample(1:N, m_left,
                                replace = FALSE)
     m_vec[pos_i]        <- m_vec[pos_i] + 1
} # end conditional statement on whether all
  # missing cells allocated
  # randomly select missing
  # positions for each unit
pos       <- matrix(0,N,T)
for( i in 1:N )
{
    sel_ij              <- sample(3:(T-3), m_vec[i],
                           replace = FALSE)
                           ## avoid edge effects
    pos[i,sel_ij]       <- 1
}

## configure N x T matrix, y_obs, with 10 percent missing
## observations (filled with NA)
## to use for sampling.  Entries in y_short
## that are set to missing (NA) are
## determined by entries of "1" in the
## N x T matrix, pos.
y_obs               <- y_short
y_obs[pos == 1]     <- NA

## Conduct dependent GP model estimation under
## missing observations in y_obs.
## We illustrate the ability to have multiple
## function or covariance terms
## by seting gp_cov = c("se","sn"), which indicates
## the first term is a
## squared exponential ("se") trend covariance term
## and the "sn" is a seasonality
## term.  The setting, sn_order = 4, indicates the
## length scale of the seasonality
## term is 4 month.  The season term is actually
## "quasi" seasonal, in that the
## seasonal covariance kernel is multiplied by a
## squared exponential, which allows
## the pattern of seasonality to evolve over time.
res_gp_2               <- gpdpgrow(y = y_obs,
                                  gp_cov=c("se","sn"),
                                  sn_order = 4,
                                  n.iter = 10,
                                  n.burn = 4,
                                  n.thin = 1,
                                  n.tune = 0)

## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gp_2        <- cluster_plot( object = res_gp_2,
                                      units_name = "state",
                                      units_label = cps$st,
                                      single_unit = TRUE,
                                      credible = TRUE )

## Compute out-of-sample fit statistic, normalized mean-square
## prediction error (MSPE)
## The normalized MSPE will take the predicted values
## for the entries in y_short purposefully
## set to NA and will subtract them from the known true
## values in y_short (and square them).
## This MSE on predicted (test) data is then
## divided by the variance of the test observations
## to output something akin to a percent error.
( nmspe_gp  <- MSPE(res_gp_2, y_short, pos)$nMSPE )
}

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples