Plots continuous data per group on a y- (or x-) axis using customizable data representations
yPlot(
data_frame,
var,
group.by,
color.by = group.by,
shape.by = NULL,
split.by = NULL,
rows.use = NULL,
plots = c("vlnplot", "boxplot", "jitter"),
multivar.aes = c("split", "group", "color"),
multivar.split.dir = c("col", "row"),
var.adjustment = NULL,
var.adj.fxn = NULL,
do.hover = FALSE,
hover.data = unique(c(var, paste0(var, ".adj"), "var.multi", "var.which", group.by,
color.by, shape.by, split.by)),
hover.round.digits = 5,
color.panel = dittoColors(),
colors = seq_along(color.panel),
shape.panel = c(16, 15, 17, 23, 25, 8),
theme = theme_classic(),
main = "make",
sub = NULL,
ylab = "make",
y.breaks = NULL,
min = NA,
max = NA,
xlab = "make",
x.labels = NULL,
x.labels.rotate = NA,
x.reorder = NULL,
split.nrow = NULL,
split.ncol = NULL,
split.adjust = list(),
do.raster = FALSE,
raster.dpi = 300,
jitter.size = 1,
jitter.width = 0.2,
jitter.color = "black",
jitter.shape.legend.size = 5,
jitter.shape.legend.show = TRUE,
jitter.position.dodge = boxplot.position.dodge,
boxplot.width = 0.2,
boxplot.color = "black",
boxplot.show.outliers = NA,
boxplot.outlier.size = 1.5,
boxplot.fill = TRUE,
boxplot.position.dodge = vlnplot.width,
boxplot.lineweight = 1,
vlnplot.lineweight = 1,
vlnplot.width = 1,
vlnplot.scaling = "area",
vlnplot.quantiles = NULL,
ridgeplot.lineweight = 1,
ridgeplot.scale = 1.25,
ridgeplot.ymax.expansion = NA,
ridgeplot.shape = c("smooth", "hist"),
ridgeplot.bins = 30,
ridgeplot.binwidth = NULL,
add.line = NULL,
line.linetype = "dashed",
line.color = "black",
line.linewidth = 0.5,
line.opacity = 1,
legend.show = TRUE,
legend.title = "make",
data.out = FALSE
)ridgePlot(..., plots = c("ridgeplot"))
ridgeJitter(..., plots = c("ridgeplot", "jitter"))
boxPlot(..., plots = c("boxplot", "jitter"))
a ggplot where continuous data, grouped by sample, age, cluster, etc., shown on either the y-axis by a violin plot, boxplot, and/or jittered points, or on the x-axis by a ridgeplot with or without jittered points.
Alternatively when data.out=TRUE
, a list containing
the plot ("p")
the underlying data as a dataframe ("data"),
and the ultimately used mapping of columns to given aesthetic sets ("cols_used"), because modification of newly made columns is required for many features.
Alternatively when do.hover = TRUE
, a plotly converted version of the ggplot where additional data will be displayed when the cursor is hovered over jitter points.
A data_frame where columns are features and rows are observations you might wish to visualize.
Single string representing the name of a column of data_frame
to be used as the primary, y-axis, data.
Alternatively, a string vector naming multiple such columns of data to plot at once.
See the input multivar.aes
to understand or tweak how multiple var-data will be shown.
Single string representing the name of a column of data_frame
containing discrete data to use for separating the data points into groups.
Single string representing the name of a column of data_frame
containing discrete data to use for setting data representation color fills.
This data does not need to be the same as group.by
, which is great for highlighting supersets or subgroups when wanted, but it defaults to group.by
so the input can often be skipped.
Single string representing the name of a column of data_frame
containing discrete data to use for setting shapes of the jitter points.
When not provided, all jitter points will be dots.
1 or 2 strings denoting the name(s) of column(s) of data_frame
containing discrete data to use for faceting / separating data points into separate plots.
When 2 columns are named, c(row,col), the first is used as rows and the second is used for columns of the resulting facet grid.
When 1 column is named, shape control can be achieved with split.nrow
and split.ncol
String vector of rownames of data_frame
OR an integer vector specifying the row-indices of data points which should be plotted.
Alternatively, a Logical vector, the same length as the number of rows in data_frame
, where TRUE
values indicate which rows to plot.
String vector which sets the types of plots to include: possibilities = "jitter", "boxplot", "vlnplot", "ridgeplot".
Order matters: c("vlnplot", "boxplot", "jitter") will put a violin plot in the back, boxplot in the middle, and then individual dots in the front.
See details section for more info.
"split", "group", or "color", the plot feature to utilize for displaying 'var' value when var
is given multiple column names.
When set to "split" (the default), note that displaying the var
-identity of the data will be prioritized so the split.by
input becomes limited to receiving a single usable element.
"row" or "col", sets the direction of faceting used for 'var' values when:
var
is given multiple column names
multivar.aes = "split"
(default)
AND split.by
is used to provide an additional feature to facet by
A recognized string indicating whether numeric var
data should be used directly (default) or should be adjusted to be
"z-score": scaled with the scale() function to produce a relative-to-mean z-score representation
"relative.to.max": divided by the maximum expression value to give percent of max values between [0,1]
Ignored if the var
data is not numeric as these known adjustments target numeric data only.
In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting.
If you wish to apply a function to edit the var
data before use, in a way not possible with the var.adjustment
input,
this input can be given a function which takes in a vector of values as input and returns a vector of values of the same length as output.
For example, function(x) {log2(x)}
or as.factor
.
In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting.
Logical which controls whether the ggplot output will be converted to a plotly object so that data about individual points can be displayed when you hover your cursor over them.
The hover.data
argument is used to determine what data to show upon hover.
String vector which denotes what data to show for each jitter data point, upon hover, when do.hover
is set to TRUE
.
Defaults to all data expected to be useful.
Only values present in the plotting data are actually used.
These can be column names of data_frame
and any column names which will be created to accommodate multivar and data adjustment functionality.
You can run the function with data.out = TRUE
and inspect the $data
output's columns to view your available options.
Integer number specifying the number of decimal digits to round displayed numeric values to, when do.hover
is set to TRUE
.
String vector which sets the colors to draw from for data representation fills.
Default = dittoColors()
.
A named vector can be used if names are matched to the distinct values of the color.by
data.
Integer vector, the indexes / order, of colors from color.panel
to actually use.
Useful for quickly swapping around colors of the default set (when not using names for color matching).
Vector of integers corresponding to ggplot shapes which sets what shapes to use.
When discrete groupings are supplied by shape.by
, this sets the panel of shapes which will be used.
When nothing is supplied to shape.by
, only the first value is used.
Default is a set of 6, c(16,15,17,23,25,8)
, the first being a simple, solid, circle.
A ggplot theme which will be applied before internal adjustments.
Default = theme_classic()
.
See https://ggplot2.tidyverse.org/reference/ggtheme.html for other options and ideas.
String, sets the plot title. Default = "make" and if left as make, a title will be automatically generated. To remove, set to NULL
.
String, sets the plot subtitle.
String, sets the continuous-axis label (=y-axis for box and violin plots, x-axis for ridgeplots).
Defaults to "var
".
Numeric vector, a set of breaks that should be used as major grid lines. c(break1,break2,break3,etc.).
Scalars which control the zoom on the continuous axis of the plot.
String which sets the grouping-axis label (=x-axis for box and violin plots, y-axis for ridgeplots).
Set to NULL
to remove.
String vector, c("label1","label2","label3",...) which overrides the names of groupings.
Logical which sets whether the labels should be rotated.
Default: TRUE
for violin and box plots, but FALSE
for ridgeplots.
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of x-axis groupings.
Method: Make a first plot without this input. Then, treating the leftmost grouping as index 1, and the rightmost as index n. Values of x.reorder should be these indices, but in the order that you would like them rearranged to be.
Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: factor(data, levels = c("level1", "level2", ...))
.
Integers which set the dimensions of faceting/splitting when faceting by a single feature.
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. `list(scales = "free")`.
For options, when giving 1 column to split.by
, see facet_wrap
,
OR when giving 2 columns to split.by
, see facet_grid
.
Logical. When set to TRUE
, rasterizes the jitter plot layer, changing it from individually encoded points to a flattened set of pixels.
This can be useful for editing in external programs (e.g. Illustrator) when there are many thousands of data points.
Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300.
Scalar which sets the size of the jitter shapes.
Scalar that sets the width/spread of the jitter in the x direction. Ignored in ridgeplots.
Note for when color.by
is used to split x-axis groupings into additional bins: ggplot does not shrink jitter widths accordingly, so be sure to do so yourself!
Ideally, needs to be 0.5/num_subgroups.
String which sets the color of the jitter shapes
Scalar which changes the size of the shape key in the legend.
If set to NA
, jitter.size
is used.
Logical which sets whether the shapes legend will be shown when its shape is determined by shape.by
.
Scalar which adjusts the relative distance between jitter widths when multiple subgroups exist per group.by
grouping (a.k.a. when group.by
and color.by
are not equal).
Similar to boxplot.position.dodge
input & defaults to the value of that input so that BOTH will actually be adjusted when only, say, boxplot.position.dodge = 0.3
is given.
Scalar which sets the width/spread of the boxplot in the x direction
String which sets the color of the lines of the boxplot
Logical, whether outliers should by including in the boxplot.
Default is FALSE
when there is a jitter plotted, TRUE
if there is no jitter.
Scalar which adjusts the size of points used to mark outliers.
Logical, whether the boxplot should be filled in or not. Known bug: when boxplot fill is turned off, outliers do not render.
Scalar which adjusts the relative distance between boxplots when multiple are drawn per grouping (a.k.a. when group.by
and color.by
are not equal).
By default, this input actually controls the value of jitter.position.dodge
unless the jitter
version is provided separately.
Scalar which adjusts the thickness of boxplot lines.
Scalar which sets the thickness of the line that outlines the violin plots.
Scalar which sets the width/spread of violin plots in the x direction
String which sets how the widths of the of violin plots are set in relation to each other.
Options are "area", "count", and "width". If the default is not right for your data, I recommend trying "width".
For an explanation of each, see geom_violin
.
Single number or numeric vector of values in [0,1] naming quantiles at which to draw a horizontal line within each violin plot. Example: c(0.1, 0.5, 0.9)
Scalar which sets the thickness of the ridgeplot outline.
Scalar which sets the distance/overlap between ridgeplots. A value of 1 means the tallest density curve just touches the baseline of the next higher one. Higher numbers lead to greater overlap. Default = 1.25
Scalar which adjusts the minimal space between the topmost grouping and the top of the plot in order to ensure the curve is not cut off by the plotting grid. The larger the value, the greater the space requested. When left as NA, dittoViz will attempt to determine an ideal value itself based on the number of groups & linear interpolation between these goal posts: #groups of 3 or fewer: 0.6; #groups=12: 0.1; #groups or 34 or greater: 0.05.
Either "smooth" or "hist", sets whether ridges will be smoothed (the typical, and default) versus rectangular like a histogram.
(Note: as of the time shape "hist" was added, combination of jittered points is not supported by the stat_binline
that dittoViz relies on.)
Integer which sets how many chunks to break the x-axis into when ridgeplot.shape = "hist"
.
Overridden by ridgeplot.binwidth
when that input is provided.
Integer which sets the width of chunks to break the x-axis into when ridgeplot.shape = "hist"
.
Takes precedence over ridgeplot.bins
when provided.
Numeric value(s), denoting y-axis value(s), where one or multiple horizonal line(s) should be added.
String which sets the type of line for add.line
.
Defaults to "dashed", but any ggplot linetype will work.
String that sets the color(s) of the add.line
line(s). Default = "black".
Alternatively, a vector of strings of the same length as add.line
can be given to set the color of each line individually.
Number that sets the linewidth of the add.line
line(s). Default = 0.5.
Alternatively, a vector of numbers of the same length as add.line
can be given to set the linewidth of each line individually.
Number that sets the opacity of the add.line
line(s). Default = 1.
Alternatively, a vector of numbers of the same length as add.line
can be given to set the opacity of each line individually.
Logical. Whether the legend should be displayed. Default = TRUE
.
String or NULL
, sets the title for the main legend which includes colors and data representations.
Logical. When set to TRUE
, changes the output, from the plot alone, to a list containing the plot (p
), its underlying data (data
),
and the ultimately used mapping of columns to given aesthetic sets, because modification of newly made columns is required for many features ("cols_used").
arguments passed to yPlot by ridgePlot, ridgeJitter, and boxPlot wrappers. Options are all the ones above.
ridgePlot()
: simple yPlot wrapper with distinct plots input defaults
ridgeJitter()
: simple yPlot wrapper with distinct plots input defaults
boxPlot()
: simple yPlot wrapper with distinct plots input defaults
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Each plot type has specific associated options which are controlled by variables that start with their associated string.
For example, all jitter adjustments start with "jitter.
", such as jitter.size
and jitter.width
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
Additionally:
Colors can be adjusted with color.panel
.
Subgroupings: color.by
can be utilized to split major group.by
groupings into subgroups.
When this is done in y-axis plotting, dittoViz automatically ensures the centers of all geoms will align,
but users will need to manually adjust jitter.width
to less than 0.5/num_subgroups to avoid overlaps.
There are also three inputs through which one can use to control geom-center placement, but the easiest way to do all at once so is to just adjust vlnplot.width
!
The other two: boxplot.position.dodge
, and jitter.position.dodge
.
Line(s) can be added at single or multiple value(s) by providing these values to add.line
.
Linetype and color are set with line.linetype
, which is "dashed" by default, and line.color
, which is "black" by default.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
The legend can be hidden by setting legend.show = FALSE
.
y-axis zoom and tick marks can be adjusted using min
, max
, and y.breaks
.
x-axis labels and groupings can be changed / reordered using x.labels
and x.reorder
, and rotation of these labels can be turned on/off with x.labels.rotate = TRUE/FALSE
.
Shapes used in conjunction with shape.by
can be adjusted with shape.panel
.
This can be very useful for making manual additional alterations after dittoViz plot generation.
Daniel Bunis, Jared Andrews
The function plots the targeted var
data of data_frame
, grouped by the columns of data given to group.by
and color.by
, using data representations given by plots
.
Data representations will also be colored (filled) based on color.by
.
If a subset of data points to use is indicated with the rows.use
input, the data_frame is internally subset to include only those indicated rows before plotting.
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
When split.by
is provided a column name of data_frame
, separate plots will be produced representing each of the distinct groupings of the split.by data using ggplots facetting functionality.
ridgePlot
, ridgeJitter
, and boxPlot
are included as wrappers of the basic yPlot
function
that simply change the default for the plots
input to be "ridgeplot"
, c("ridgeplot","jitter")
, or c("boxplot","jitter")
,
to make such plots even easier to produce.
ridgePlot
, ridgeJitter
, and boxPlot
for shortcuts to a few 'plots' input shortcuts
example("dittoExampleData", echo = FALSE)
# Basic yPlot, with jitter behind a vlnplot (looks better with more points)
yPlot(data_frame = example_df, var = "gene1", group.by = "timepoint")
yPlot(data_frame = example_df, var = c("gene1", "gene2"), group.by = "timepoint")
# Color distinctly from the grouping variable using 'color.by'
yPlot(data_frame = example_df, var = "gene1", group.by = "timepoint",
color.by = "conditions")
# Update the 'plots' input to change / reorder the data representations
yPlot(example_df, "gene1", "timepoint",
plots = c("vlnplot", "boxplot", "jitter"))
yPlot(example_df, "gene1", "timepoint",
plots = c("ridgeplot", "jitter"))
# Provided wrappers enable certain easy adjustments of the 'plots' parameter.
# Quickly make a Boxplot
boxPlot(example_df, "gene1", "timepoint")
# Quickly make a Ridgeplot, with or without jitter
ridgePlot(example_df, "gene1", "timepoint")
ridgeJitter(example_df, "gene1", "timepoint")
# Modify the look with intuitive inputs
yPlot(example_df, "gene1", "timepoint",
plots = c("vlnplot", "boxplot", "jitter"),
boxplot.color = "white",
main = "CD3E",
legend.show = FALSE)
if (FALSE) {
# (Due to unfortunate CRAN submission constraints)
# Data can also be split in other ways with 'shape.by' or 'split.by'
yPlot(data_frame = example_df, var = "gene1", group.by = "timepoint",
plots = c("vlnplot", "boxplot", "jitter"),
shape.by = "clustering",
split.by = "SNP") # single split.by element
yPlot(data_frame = example_df, var = "gene1", group.by = "timepoint",
plots = c("vlnplot", "boxplot", "jitter"),
split.by = c("groups","SNP")) # row and col split.by elements
# Multiple features can also be plotted at once by giving them as a vector to
# the 'var' input. One aesthetic of the plot will then be used to display the
# 'var'-info, and you can control which (faceting / "split", x-axis grouping
# / "group", or color / "color") with 'multivar.aes':
yPlot(data_frame = example_df, group.by = "timepoint",
var = c("gene1", "gene2"))
yPlot(data_frame = example_df, group.by = "timepoint",
var = c("gene1", "gene2"),
multivar.aes = "group")
yPlot(data_frame = example_df, group.by = "timepoint",
var = c("gene1", "gene2"),
multivar.aes = "color")
}
Run the code above in your browser using DataLab