Visualizes missing values, treatment and outcome variables, and their relationships in panel data
panelview(data, formula = NULL, Y = NULL, D = NULL,
X = NULL, index,
ignore.treat = FALSE, type = "treat",
outcome.type = "continuous",
treat.type = NULL, by.group = FALSE, by.group.side = FALSE,
by.timing = FALSE, theme.bw = TRUE,
xlim = NULL, ylim = NULL,
xlab = NULL, ylab = NULL,
gridOff = FALSE, legendOff = FALSE,
legend.labs = NULL, main = NULL,
pre.post = NULL, id = NULL, show.id = NULL,
color = NULL, axis.adjust = FALSE, axis.lab = "both",
axis.lab.gap = c(0, 0), axis.lab.angle = NULL, shade.post = FALSE,
cex.main = 15, cex.main.sub = 12, cex.axis = 8,
cex.axis.x = NULL, cex.axis.y = NULL,
cex.lab = 12, cex.legend = 12, background = NULL,
style = NULL, by.unit = FALSE, lwd = 0.2, leave.gap = FALSE,
display.all = NULL, by.cohort = FALSE,
collapse.history = NULL, report.missing = FALSE)
a data frame. The panel does not have to be balanced.
an object of class "formula": a symbolic description of the model to be fitted. The first variable on the right-hand-side is designated as the treatment indicator if ignore.treat = FALSE
. If there is not any covariates, the formula should be like Y~1
, where Y
is the outcome variable.
variable name of the outcome. Ignored if formula
is provided.
variable name of the treatment. Ignored if formula
is provided.
variable name of the time-varying covariates. Ignored if formula
is provided.
a two-element string vector specifying the unit (group) and time indicators. Must be of length 2.
a logical flag indicating whether there is a treatment variable. Default value is ignore.treat = FALSE
.
a string that specifies the type of the plot. Must be either "treat"
(default), which plots the treatment status of each unit at each time point, "missing"
, which plots the missing-data, "outcome"
, which plots the raw outcome data, or "bivariate"
, which plots time series of outcome and treatment in one graph.
a string that specifies the type of outcome variable. Must be either "continuous"
(default) or "discrete"
. For a continuous variable, time series lines for specified units will be plotted, and for discrete response, jitter-ed points at each time period will be plotted.
a string that specifies the type of treatment variable. Must be either "continuous"
or "discrete"
. The default is NULL, which means the option will be decided based on the number of unique treatment values: if the number if bigger than 10, it will be set as "continuous"; otherwise, it will be set as "discrete".
a logic flag indicating whether the data should be plotted in a column in separate groups based on treatment status changes for the outcome plot.
a logical flag indicating whether to arrange subfigures of by.group = TRUE
in a row rather than in a column.
a logic flag indicating whether the units should be sorted based on the timing of receiving the treatment for the treat plot.
a logical flag specifying whether to use a black-and-white theme.
a two-element numeric vector specifying the range of x-axis. When the class of time variable is string, must specify the range of strings to be shown, e.g. xlim=c(1,30)
.
a two-element numeric vector specifying the range of y-axis.
a string indicating the label of the x-axis.
a string indicating the label of the y-axis.
a logical flag controlling whether to show the grid lines on the treat plot..
a logical flag controlling whether to show the legend.
a vector specifying the legend labels. Ignored when legendOff=TRUE
.
a string that controls the title of the plot.
a logical flag indicating whether to distinguish control status of treated units from that of control units. Only used for staggered data in the treat and outcome plots.
a vector specifying units to be shown in the plot. Useful when the number of units is very large.
a numeric vector or sequence specifying the sorted order of units to be shown in the "treat"
plot. Useful when the number of units is very large. Ignored if !is.null("id")
.
a string vector specifying color setting for the plot.
a logic flag indicating whether to adjust labels on the x-axis. Useful when the class of time variable is string and there are many time periods.
a string indicating whether labels on the x- and y-axis will be shown. There are four options: "both"
(default): labels on both axes will be shown; "unit"
: only labels on y-axis will be shown; "time"
: only labels on the x-axis will be shown; "none": no labels will be shown.
a numeric vector setting the gaps between labels on the x- or y-axis for the plot. Default is axis.lab.gap = c(0, 0)
, which means that all labels will be shown. Useful for datasets with large N or T.
a numeric value setting the angle (degrees) of the labels shown on the x-axis. Must be between 0 and 90.
a logical flag controlling whether to shade the post-treatment periods. Ignored if type = "treat"
or no treatment variable is supplied.
a numeric value (pt) specifying the fontsize of the main title.
a numeric value (pt) specifying the fontsize of the subtitles. Ignored if type = "treat"
or by.group = FALSE.
a numeric value (pt) specifying the fontsize of the texts on the axes; overwritten by cex.axis.x
or cex.axis.y
.
a numeric value (pt) specifying the fontsize of the texts on the x-axis.
a numeric value (pt) specifying the fontsize of the texts on the y-axis.
a numeric value (pt) specifying the fontsize of the axis titles.
a numeric value (pt) specifying the fontsize of the legend.
a character specifying the background color.
a logic flag indicating whether to plot by each specified units or to plot mean D and Y against time in the same graph.
a string vector to set line/connected line/bar styles for the outcome and treatment variables.
a numeric value (pt) specifying the line width when plotting time series of treatment and outcome variables.
a logical flag indicating whether to keep time gaps as white bars if time is not evenly distributed (possibly due to missing data). Default value is leave.gap = FALSE
.
a logical flag indicating whether to show all units if the number of units is more than 500, otherwise we randomly select 500 units to present.
a logical flag indicating whether to plot the average outcome lines based on unique treatment histories in an "outcome" plot.
a logical flag indicating whether to collapse units by treat history in a "treat"" plot.
a logical flag indicating whether to report missingness in the included variables.
Hongyu Mou <hongyumou@g.ucla.edu>
Licheng Liu <liulch@mit.edu>
Yiqing Xu <yiqingxu@stanford.edu>
panelview visualizes the treatment status, missing values, and raw outcome data of a time-series cross-sectional dataset.
Hongyu Mou, Licheng Liu and Yiqing Xu (2023). "Panel Data Visualization in R (panelView) and Stata (panelview)." Journal of Statistical Software, 107(7), pp. 1--20. <doi:10.18637/jss.v107.i07>
library(panelView)
data(panelView)
panelview(turnout ~ policy_edr + policy_mail_in + policy_motor,
data = turnout, index = c("abb","year"))
Run the code above in your browser using DataLab