plotdist: Visualising Densities using Quantiles

Description

The function estimates and visualizes the density curves of one-group or two-group studies presenting quantile summary measures with the sample size (\(n\)). The quantile summaries can fall into one of the following categories:

\(S_1\): { minimum, median, maximum }
\(S_2\): { first quartile, median, third quartile }
\(S_3\): { minimum, first quartile, median, third quartile, maximum }

The plotdist function uses the following quantile-based distribution methods for visualising densities using qantiles (De Livera et al., 2024).

Generalised Lambda Distribution (GLD) when 5-number summaries present (\(S_3\)).
Skew Logistic Distribution (SLD) when 3-number summaries present (\(S_1\) and \(S_2\)).

Usage

plotdist(
   data, 
   xmin = NULL, 
   xmax = NULL, 
   ymax = NULL,
   length.out = 1000, 
   title = "", 
   xlab = "x", 
   ylab = "Density",
   line.size = 0.5,
   title.size = 12,
   lab.size = 10,
   color.g1 = "pink",
   color.g2 = "skyblue",
   color.g1.pooled = "red",
   color.g2.pooled = "blue",
   label.g1 = NULL, 
   label.g2 = NULL,
   display.index = FALSE,
   display.legend = FALSE,
   pooled.dist = FALSE, 
   pooled.only = FALSE, 
   opt = TRUE
)

Value

An interactive plotly object visualizing the estimated density curve(s) for one or two groups.

Arguments

data

data frame containing the quantile summary data. For one-group studies, the input may contain the following columns depending on the quantile scenario:

'stduy.index': stduy index or name

'min.g1'

minimum value

'q1.g1'

first quartile

'med.g1'

median

'q3.g1'

third quartile

'max.g1'

maximum value

'n.g1'

sample size

For two-group studies, the data frame may also contain the following columns for the second group: min.g2, q1.g2, med.g2, q3.g2, max.g2 and n.g2. Note that, for three-point summaries (\(S_1\) and \(S_2\)), only the relevant columns should be included.

xmin

numeric value for the lower limit of the x-axis for density calculation. It is recommended to set this to a value smaller than the smallest value across the quantile summaries to ensure the density curve is fully captured. If xmin is not provided, the minimum value of the 'min.' columns will be used for scenario \(S_1\) or \(S_3\). Note that for scenario \(S_2\), no default calculation is performed for xmin.

xmax

numeric value for the upper limit of the x-axis for density calculation. It is recommended to set this to a value larger than the largest value across the quantile summaries to ensure the density curve is fully captured. If xmax is not provided, the maximum value of the 'max.' columns will be used for scenario \(S_1\) or \(S_3\). Similarly, for scenario \(S_2\), no default calculation is performed for xmax.

ymax

numeric value for the upper limit of the y-axis. If NULL, the highest density value will be used.

length.out

integer specifying the number of points along the x-axis for density calculation. Default is 1000.

title

character string for the plot title. Default is an empty string.

xlab

character string for the x-axis label. Default is "x".

ylab

character string for the y-axis label. Default is "Density".

line.size

numeric. Thickness of the density curve lines. Default is 0.5.

title.size

numeric. Font size for the plot title. Default is 12.

lab.size

numeric. Font size for axis labels. Default is 10.

color.g1

character string specifying the color for individual density curves of group 1 for each study (row). Default is "pink".

color.g2

character string specifying the color for individual density curves of group 2 for each study (row). Default is "skyblue".

color.g1.pooled

character string specifying the color for pooled density curve of group 1. Default is "red".

color.g2.pooled

character string specifying the color for pooled density curve of group 2. Default is "blue".

label.g1

character string indicating label or name for group 1 (eg., 'Treatment')

label.g2

character string indicating label or name for group 2 (eg., 'Control').

If 'label.g1' and 'label.g2' are not provided, the function will assign labels as 'Group 1' and 'Group 2'.

display.index

logical. If TRUE, the 'study.index' of each quantile set (row) will be displayed alongside the corresponding density curve. The default is FALSE, meaning no labels will be shown. The label text size is controlled by the lab.size parameter.

display.legend

logical. If TRUE, legends ('label.g1' and/or 'label.g2') will be displayed on the right side of the plot. The default is FALSE. The legend text size is controlled by the lab.size parameter.

pooled.dist

logical. If TRUE, pooled density curves for group 1 and/or group 2 will be plotted along with the individual density curves. The default is FALSE.

pooled.only

logical. If TRUE, only the pooled density curves of group 1 and/or group 2 will be plotted, excluding the individual density curves. The default is FALSE.

opt

logical value indicating whether to apply the optimization step when estimating GLD or SLD parameters. The default value is TRUE.

Details

The generalised lambda distribution (GLD) is a four parameter family of distributions defined by its quantile function under the FKML parameterisation (Freimer et al., 1988). De Livera et al. propose that the GLD quantile function can be used to approximate a sample's distribution using 5-point summaries. The four parameters of GLD quantile function include: a location parameter (\(\lambda_1\)), an inverse scale parameter (\(\lambda_2\)>0), and two shape parameters (\(\lambda_3\) and \(\lambda_4\)).

The quantile-based skew logistic distribution (SLD), introduced by Gilchrist (2000) and further modified by van Staden and King (2015) is used to approximate the sample's distribution using 3-point summaries. The SLD quantile function is defined using three parameters: a location parameter (\(\lambda\)), a scale parameter (\(\eta\)), and a skewing parameter (\(\delta\)).

These parameters of GLD and SLD are estimated by formulating and solving a series of simultaneous equations which relate the estimated quantiles with the population counterparts of respective distribution (GLD or SLD). The plotdist uses these estimated parameters, to compute the density data using dgl function from the gld package and dsl function from the sld package.

If one needs to generate pooled density plots, they can use the pooled.dist or pooled.only arguments as described in the Arguments section. The pooled density curves represent a weighted average of individual study densities, with weights determined by sample sizes. The method is similar to obtaining pooled estimates of effects in a standard meta-analysis and it serves as a way to visualize combined estimated distributional information across studies.

References

De Livera, A. M., Prendergast, L., & Kumaranathunga, U. (2024). A novel density-based approach for estimating unknown means, distribution visualisations and meta-analyses of quantiles. arXiv preprint arXiv:2411.10971. https://arxiv.org/abs/2411.10971.

Freimer, M., Kollia, G., Mudholkar, G. S., & Lin, C. T. (1988). A study of the generalized Tukey lambda family. Communications in Statistics—Theory and Methods, 17(10), 3547–3567.

Gilchrist, W. (2000). Statistical modelling with quantile functions. Chapman & Hall/CRC.

van Staden, P. J., & King, R. A. R. (2015). The quantile-based skew logistic distribution. Statistics & Probability Letters, 96, 109–116.

Examples

Run this code

#Example dataset of 3-point summaries (min, med, max) for 2 groups
data_3num_2g <- data.frame(
  study.index = c("Study 1", "Study 2", "Study 3"),
  min.g1 = c(15, 15, 13),
  med.g1 = c(66, 68, 63),
  max.g1 = c(108, 101, 100),
  n.g1 = c(226, 230, 200),
  min.g2 = c(18, 19, 15),
  med.g2 = c(73, 82, 81),
  max.g2 = c(110, 115, 100),
  n.g2 = c(226, 230, 200)
 )
print(data_3num_2g)

#Density plots of two groups along with the pooled plots
plot_2g <- plotdist(
  data_3num_2g,
  xmin = 10,
  xmax = 125,
  title = "Example Density Plots of Two Groups",
  xlab = "x data",
  color.g1 = "skyblue",
  color.g2 = "pink",
  color.g1.pooled = "blue",
  color.g2.pooled = "red",
  label.g1 = "Treatment", 
  label.g2 = "Control",
  display.legend = TRUE,
  pooled.dist = TRUE
)
print(plot_2g)

Run the code above in your browser using DataLab