est.mean: Estimating Sample Mean using Quantiles

Description

This function estimates the sample mean from a study presenting quantile summary measures with the sample size (\(n\)). The quantile summaries can fall into one of the following categories:

\(S_1\): { minimum, median, maximum }
\(S_2\): { first quartile, median, third quartile }
\(S_3\): { minimum, first quartile, median, third quartile, maximum }

The est.mean function implements newly proposed flexible quantile-based distribution methods for estimating sample mean (De Livera et al., 2024). It also incorporates existing methods for estimating sample means as described by Luo et al. (2018) and McGrath et al. (2020).

Usage

est.mean(
   min = NULL, 
   q1 = NULL, 
   med = NULL, 
   q3 = NULL, 
   max = NULL, 
   n = NULL, 
   method = "gld/sld", 
   opt = TRUE
   )

Value

mean: numeric value representing the estimated mean of the sample.

Arguments

min

numeric value representing the sample minimum.

q1

numeric value representing the first quartile of the sample.

med

numeric value representing the median of the sample.

q3

numeric value representing the third quartile of the sample.

max

numeric value representing the sample maximum.

n

numeric value specifying the sample size.

method

character string specifying the approach used to estimate the sample means. The options are the following:

'gld/sld': The default option. The method proposed by De Livera et al. (2024). Estimation using the generalised lambda distribution (GLD) for 5-number summaries (\(S_3\)), and the skew logistic distribution (SLD) for 3-number summaries (\(S_1\) and \(S_2\)).

'luo'

Method of Luo et al. (2018).

'hozo/wan/bland'

The method proposed by Wan et al. (2014). i.e., the method of Hozo et al. (2005) for \(S_1\), method of Wan et al. (2014) for \(S_2\), and method of Bland (2015) for \(S_3\).

'bc'

Box-Cox method proposed by McGrath et al. (2020).

'qe'

Quantile Matching Estimation method proposed by McGrath et al. (2020).

opt

logical value indicating whether to apply the optimisation step of 'gld/sld' method, in estimating their parameters using theoretical quantiles. The default value is TRUE.

Details

The 'gld/sld' method (i.e., the method of De Livera et al., (2024)) of est.mean uses the following quantile based distributions:

Generalised Lambda Distribution (GLD) for estimating the sample mean using 5-number summaries (\(S_3\)).
Skew Logistic Distribution (SLD) for estimating the sample mean using 3-number summaries (\(S_1\) and \(S_2\)).

The generalised lambda distribution (GLD) is a four parameter family of distributions defined by its quantile function under the FKML parameterisation (Freimer et al., 1988). De Livera et al. propose that the GLD quantlie function can be used to approximate a sample's distribution using 5-point summaries. The four parameters of GLD quantile function include: a location parameter (\(\lambda_1\)), an inverse scale parameter (\(\lambda_2\)>0), and two shape parameters (\(\lambda_3\) and \(\lambda_4\)).

The quantile-based skew logistic distribution (SLD), introduced by Gilchrist (2000) and further modified by van Staden and King (2015) is used to approximate the sample's distribution using 3-point summaries. The SLD quantile function is defined using three parameters: a location parameter (\(\lambda\)), a scale parameter (\(\eta\)), and a skewing parameter (\(\delta\)).

For 'gld/sld' method, the parameters of the GLD and SLD are estimated by formulating and solving a set of simultaneous equations. These equations relate the estimated sample quantiles to their theoretical counterparts of the respective distribution (GLD or SLD). Finally, the mean for each scenario is calculated by integrating functions of the estimated quantile function.

References

De Livera, A. M., Prendergast, L., & Kumaranathunga, U. (2024). A novel density-based approach for estimating unknown means, distribution visualisations and meta-analyses of quantiles. arXiv preprint arXiv:2411.10971. https://arxiv.org/abs/2411.10971.

Luo, D., Wan, X., Liu, J., & Tong, T. (2018). Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, 27(6), 1785-1805.

Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14, 1–13.

McGrath, S., Zhao, X., Steele, R., Thombs, B. D., Benedetti, A., & the DEPRESSD Collaboration. (2020b). Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Statistical Methods in Medical Research, 29(9), 2520–2537.

Freimer, M., Kollia, G., Mudholkar, G. S., & Lin, C. T. (1988). A study of the generalized Tukey lambda family. Communications in Statistics—Theory and Methods, 17(10), 3547–3567.

Gilchrist, W. (2000). Statistical modelling with quantile functions. Chapman & Hall/CRC.

van Staden, P. J., & King, R. A. R. (2015). The quantile-based skew logistic distribution. Statistics & Probability Letters, 96, 109–116.

King, R., Dean, B., Klinke, S., & van Staden, P. (2025). gld: Estimation and use of the Generalised (Tukey) Lambda Distribution (R package Version 2.6.7). Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.gld. https://CRAN.R-project.org/package=gld.

King, R., & van Staden, P. (2022). sld: Estimation and use of the Quantile-Based Skew Logistic Distribution (R package Version 1.0.1). Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.sld. https://CRAN.R-project.org/package=sld.

Examples

Run this code

#Generate 5-point summary data
set.seed(123)
n <- 1000
x <- stats::rlnorm(n, 4, 0.3)
quants <- c(min(x), stats::quantile(x, probs = c(0.25, 0.5, 0.75)), max(x))
obs_mean <- mean(x)

#Estimate sample mean using s3 (5 number summary)
est_mean_s3 <- est.mean(min = quants[1], q1 = quants[2], med = quants[3], q3 = quants[4], 
                        max = quants[5], n=n, method = "gld/sld")
est_mean_s3

#Estimate sample mean using s1 (min, median, max)
est_mean_s1 <- est.mean(min = quants[1], med = quants[3], max = quants[5],
                        n=n, method = "gld/sld")
est_mean_s1

#Estimate sample mean using s2 (q1, median, q3)
est_mean_s2 <- est.mean(q1 = quants[2], med = quants[3], q3 = quants[4],
                        n=n, method = "gld/sld")
est_mean_s2

Run the code above in your browser using DataLab