plot.pmc.ppc: Plots of Posterior Predictive Checks

Description

This may be used to plot, or save plots of, samples in an object of class pmc.ppc. A variety of plots is provided.

Usage

# S3 method for pmc.ppc
plot(x, Style=NULL, Data=NULL, Rows=NULL,
     PDF=FALSE, …)

Arguments

This required argument is an object of class pmc.ppc.

Style

This optional argument specifies one of several styles of plots, and defaults to NULL (which is the same as "Density"). Styles of plots are indicated in quotes. Optional styles include "Covariates", "Covariates, Categorical DV", "Density", "DW", "DW, Multivariate, C", "ECDF", "Fitted", "Fitted, Multivariate, C", "Fitted, Multivariate, R", "Jarque-Bera", "Jarque-Bera, Multivariate, C", "Mardia", "Predictive Quantiles", "Residual Density", "Residual Density, Multivariate, C", "Residual Density, Multivariate, R", "Residuals", "Residuals, Multivariate, C", "Residuals, Multivariate, R", "Space-Time by Space", "Space-Time by Time", "Spatial", "Spatial Uncertainty", "Time-Series", "Time-Series, Multivariate, C", and "Time-Series, Multivariate, R". Details are given below.

Data

This optional argument accepts the data set used when updating the model. Data is required only with certain plot styles, including "Covariates", "Covariates, Categorical DV", "DW, Multivariate, C", "Fitted, Multivariate, C", "Fitted, Multivariate, R", "Jarque-Bera, Multivariate, C", "Mardia", "Residual Density, Multivariate, C", "Residual Density, Multivariate, R", "Residuals, Multivariate, C", "Residuals, Multivariate, R", "Space-Time by Space", "Space-Time by Time", "Spatial", "Spatial Uncertainty", "Time-Series, Multivariate, C", and "Time-Series, Multivariate, R".

Rows

This optional argument is for a vector of row numbers that specify the records associated by row in the object of class pmc.ppc. Only these rows are plotted. The default is to plot all rows. Some plots do not allow rows to be specified.

PDF

This logical argument indicates whether or not the user wants Laplace's Demon to save the plots as a .pdf file.

…

Additional arguments are unused.

Details

This function can be used to produce a variety of posterior predictive plots, and the style of plot is selected with the Style argument. Below are some notes on the styles of plots.

Covariates requires Data to be specified, and also requires that the covariates are named X or x. A plot is produced for each covariate column vector against yhat, and is appropriate when y is not categorical.

Covariates, Categorical DV requires Data to be specified, and also requires that the covariates are named X or x. A plot is produced for each covariate column vector against yhat, and is appropriate when y is categorical.

Density plots show the kernel density of the posterior predictive distribution for each selected row of y (all are selected by default). A vertical red line indicates the position of the observed y along the x-axis. When the vertical red line is close to the middle of a normal posterior predictive distribution, then there is little discrepancy between y and the posterior predictive distribution. When the vertical red line is in the tail of the distribution, or outside of the kernel density altogether, then there is a large discrepancy between y and the posterior predictive distribution. Large discrepancies may be considered outliers, and moreover suggest that an improvement in model fit should be considered.

DW plots the distributions of the Durbin-Watson (DW) test statistics (Durbin and Watson, 1950), both observed ( $d^{o b s}$ as a transparent, black density) and replicated ( $d^{r e p}$ as a transparent, red density). The distribution of $d^{o b s}$ is estimated from the model, and $d^{r e p}$ is simulated from normal residuals without autocorrelation, where the number of simulations are the same as the observed number. This DW test may be applied to the residuals of univariate time-series models (or otherwise ordered residuals) to detect first-order autocorrelation. Autocorrelated residuals are not independent. The DW test is applicable only when the residuals are normally-distributed, higher-order autocorrelation is not present, and y is not used also as a lagged predictor. The DW test statistic, $d^{o b s}$ , occurs in the interval (0,4), where 0 is perfect positive autocorrelation, 2 is no autocorrelation, and 4 is perfect negative autocorrelation. The following summary is reported on the plot: the mean of $d^{o b s}$ (and its 95% probability interval), the probability that $d^{o b s} > d^{r e p}$ , and whether or not autocorrelation is found. Positive autocorrelation is reported when the observed process is greater than the replicated process in 2.5% of the samples, and negative autocorrelation is reported when the observed process is greater than the replicated process in 97.5% of the samples.

DW, Multivariate, C requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These plots compare each column-wise vector of residuals with a univariate Durbin-Watson test, as in DW above. This plot is appropriate when Y is multivariate, not categorical, and residuals are desired to be tested column-wise for first-order autocorrelation.

ECDF (Empirical Cumulative Distribution Function) plots compare the ECDF of y with three ECDFs of yhat based on the 2.5%, 50% (median), and 97.5% of its distribution. The ECDF(y) is defined as the proportion of values less than or equal to y. This plot is appropriate when y is univariate and at least ordinal.

Fitted plots compare y with the probability interval of its replicate, and provide loess smoothing. This plot is appropriate when y is univariate and not categorical.

Fitted, Multivariate, C requires Data to be specified, and also requires that variable Y exists in the data set with exactly that name. These plots compare each column-wise vector of y in Y with its replicates and provide loess smoothing. This plot is appropriate when Y is multivariate, not categorical, and desired to be seen column-wise.

Fitted, Multivariate, R requires Data to be specified, and also requires that variable Y exists in the data set with exactly that name. These plots compare each row-wise vector of y in Y with its replicates and provide loess smoothing. This plot is appropriate when Y is multivariate, not categorical, and desired to be seen row-wise.

Jarque-Bera plots the distributions of the Jarque-Bera (JB) test statistics (Jarque and Bera, 1980), both observed ( $J B^{o b s}$ as a transparent black density) and replicated ( $J B^{r e p}$ as a transparent red density). The distribution of $J B^{o b s}$ is estimated from the model, and $J B^{r e p}$ is simulated from normal residuals, where the number of simulations are the same as the observed number. This Jarque-Bera test may be applied to the residuals of univariate models to test for normality. The Jarque-Bera test does not test normality per se, but whether or not the distribution has kurtosis and skewness that match a normal distribution, and is therefore a test of the moments of a normal distribution. The following summary is reported on the plot: the mean of $J B^{o b s}$ (and its 95% probability interval), the probability that $J B^{o b s} > J B^{r e p}$ , and whether or not normality is indicated. Non-normality is reported when the observed process is greater than the replicated process in either 2.5% or 97.5% of the samples.

Jarque-Bera, Multivariate, C requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These plots compare each column-wise vector of residuals with a univariate Jarque-Bera test, as in Jarque-Bera above. This plot is appropriate when Y is multivariate, not categorical, and residuals are desired to be tested column-wise for normality.

Mardia plots the distributions of the skewness (K3) and kurtosis (K4) test statistics (Mardia, 1970), both observed ( $K 3^{o b s}$ and $K 4^{o b s}$ as transparent black density) and replicated ( $K 3^{r e p}$ and $K 4^{r e p}$ as transparent red density). The distributions of $K 3^{o b s}$ and $K 4^{o b s}$ are estimated from the model, and both $K 3^{r e p}$ $K 4^{r e p}$ are simulated from multivariate normal residuals, where the number of simulations are the same as the observed number. This Mardia's test may be applied to the residuals of multivariate models to test for multivariate normality. Mardia's test does not test for multivariate normality per se, but whether or not the distribution has kurtosis and skewness that match a multivariate normal distribution, and is therefore a test of the moments of a multivariate normal distribution. The following summary is reported on the plots: the means of $K 3^{o b s}$ and $K 4^{o b s}$ (and the associated 95% probability intervals), the probabilities that $K 3^{o b s} > K 3^{r e p}$ and $K 4^{o b s} > K 4^{r e p}$ , and whether or not multivariate normality is indicated. Non-normality is reported when the observed process is greater than the replicated process in either 2.5% or 97.5% of the samples. Mardia requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. Y must be a $N \times P$ matrix of $N$ records and $P$ variables. Source code was modified from the deprecated package QRMlib.

Predictive Quantiles plots compare y with the predictive quantile (PQ) of its replicate. This may be useful in looking for patterns with outliers. Instances outside of the gray lines are considered outliers.

Residual Density plots the residual density of the median of the samples. A vertical red line occurs at zero. This plot may be useful for inspecting a distributional assumption of residual variance. This plot is appropriate when y is univariate and continuous.

Residual Density, Multivariate C requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These are column-wise plots of residual density, given the median of the samples. These plots may be useful for inspecting a distributional assumption of residual variance. This plot is appropriate when Y is multivariate, continuous, and densities are desired to be seen column-wise.

Residual Density, Multivariate R requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These are row-wise plots of residual density, given the median of the samples. These plots may be useful for inspecting a distributional assumption of residual variance. This plot is appropriate when Y is multivariate, continuous, and densities are desired to be seen row-wise.

Residuals plots compare y with its residuals. The probability interval is plotted as a line. This plot is appropriate when y is univariate.

Residuals, Multivariate, C requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These are plots of each column-wise vector of residuals. The probability interval is plotted as a line. This plot is appropriate when Y is multivariate, not categorical, and the residuals are desired to be seen column-wise.

Residuals, Multivariate, R requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These are plots of each row-wise vector of residuals. The probability interval is plotted as a line. This plot is appropriate when Y is multivariate, not categorical, and the residuals are desired to be seen row-wise.

Space-Time by Space requires Data to be specified, and also requires that the following variables exist in the data set with exactly these names: latitude, longitude, S, and T. These space-time plots compare the S x T matrix Y with the S x T matrix Yrep, producing one time-series plot per point s in space, for a total of S plots. Therefore, these are time-series plots for each point s in space across T time-periods. See Time-Series plots below.

Space-Time by Time requires Data to be specified, and also requires that the following variables exist in the data set with exactly these names: latitude, longitude, S, and T. These space-time plots compare the S x T matrix Y with the S x T matrix Yrep, producing one spatial plot per time-period, and T plots will be produced. See Spatial plots below.

Spatial requires Data to be specified, and also requires that the following variables exist in the data set with exactly these names: latitude and longitude. This spatial plot shows yrep plotted according to its coordinates, and is color-coded so that higher values of yrep become more red, and lower values become more yellow.

Spatial Uncertainty requires Data to be specified, and also requires that the following variables exist in the data set with exactly these names: latitude and longitude. This spatial plot shows the probability interval of yrep plotted according to its coordinates, and is color-coded so that wider probability intervals become more red, and lower values become more yellow.

Time-Series plots compare y with its replicate, including the median and probability interval quantiles. This plot is appropriate when y is univariate and ordered by time.

Time-Series, Multivariate, C requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These plots compare each column-wise time-series in Y with its replicate, including the median and probability interval quantiles. This plot is appropriate when y is multivariate and each time-series is indexed by column in Y.

Time-Series, Multivariate, R requires Data to be specified, and also requires that variable Y exist in the data set with exactly that name. These plots compare each row-wise time-series in Y with its replicate, including the median and probability interval quantiles. This plot is appropriate when y is multivariate and each time-series is indexed by row in Y, such as is typically true in panel models.

References

Durbin, J., and Watson, G.S. (1950). "Testing for Serial Correlation in Least Squares Regression, I." Biometrika, 37, p. 409--428.

Jarque, C.M. and Bera, A.K. (1980). "Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals". Economics Letters, 6(3), p. 255--259.

Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis with Applications". Biometrika, 57(3), p. 519--530.

Examples

Run this code

# NOT RUN {
### See the PMC function for an example.
# }

Run the code above in your browser using DataLab

Data engineering and BI courses are free!