mean_var_logwinf: Expectation and Variance of Log Winsorized F Distribution

Description

mean_var_logwinf calculates the expectation and variance of a log Winsorized F distribution by appealing to methods for numerical integration.

Usage

mean_var_logwinf(
  df1,
  df2,
  p_low = 0.01,
  p_up = 0.1,
  nw = gauss.quad(128, kind = "legendre")
)

Value

A list consisting of the following components:

mu: Vector of expectations.
v: Vector of variances.

Arguments

df1, df2

Vectors of numbers of numerator and denominator degrees of freedom. Inf is allowed.

p_low, p_up

Vectors of lower- and upper-tail probabilities for Winsorizing. Each element must be strictly larger than 0, and each pair of p_low and p_up must have a sum strictly smaller than 1.

Note that df1, df2, p_low, and p_up are recycled to align with the longest of them.

nw

A list containing nodes and weights variables for calculating the definite integral of a function f over the interval [-1, 1], which is approximated by sum(nw$weights * f(nw$nodes)). By default, mean_var_logwinf uses a set of Gauss-Legendre nodes along with the corresponding weights calculated by gauss.quad.

Details

The function implements exactly the same method described in Phipson et al., 2016 (see "References").

References

Phipson, B., et al., Robust Hyperparameter Estimation Protects against Hypervariable Genes and Improves Power to Detect Differential Expression. Annals of Applied Statistics, 2016. 10(2): p. 946-963.

Examples

Run this code

# Derive the expectation and variance of a log Winsorized F distribution by
# simulation.
random_logwinf <- function(n, df1, df2, p_low, p_up) {
    x <- rf(n, df1, df2)
    q_low <- qf(p_low, df1, df2, lower.tail = TRUE)
    q_up <- qf(p_up, df1, df2, lower.tail = FALSE)
    x[x < q_low] <- q_low
    x[x > q_up] <- q_up
    x <- log(x)
    c(mean(x), var(x))
}

# Set parameters.
n <- 10000
df1 <- 2
df2 <- 2 ^ (0:10)
p_low <- 0.01
p_up <- 0.1

# Compare simulation results with those from numerical integration.
set.seed(100)
res1 <- vapply(df2, function(x) random_logwinf(n, df1, x, p_low, p_up),
               numeric(2))
res2 <- mean_var_logwinf(df1, df2, p_low, p_up)

# Compare mean.
plot(0:10, res1[1, ], type = "l", lwd = 2, col = "red", xlab = "Log2(df2)",
     ylab = "Mean")
lines(0:10, res2$mu, lty = 5, lwd = 2, col = "blue")
legend("topright", c("Simulation", "Numerical integration"), lty = c(1, 5),
       lwd = 2, col = c("red", "blue"))

# Compare variance.
plot(0:10, res1[2, ], type = "l", lwd = 2, col = "red", xlab = "Log2(df2)",
     ylab = "Var")
lines(0:10, res2$v, lty = 5, lwd = 2, col = "blue")
legend("topright", c("Simulation", "Numerical integration"), lty = c(1, 5),
       lwd = 2, col = c("red", "blue"))

# When df2 is Inf.
random_logwinf(n, df1, Inf, p_low, p_up)
mean_var_logwinf(df1, Inf, p_low, p_up)

Run the code above in your browser using DataLab