Learn R Programming

depower (version 2025.1.20)

sim_log_lognormal: Simulate data from a normal distribution

Description

Simulate data from the log transformed lognormal distribution (i.e. a normal distribution). This function handles all three cases:

  1. One-sample data

  2. Dependent two-sample data

  3. Independent two-sample data

Usage

sim_log_lognormal(
  n1,
  n2 = NULL,
  ratio,
  cv1,
  cv2 = NULL,
  cor = 0,
  nsims = 1L,
  return_type = "list",
  ncores = 1L,
  messages = TRUE
)

Value

If nsims = 1 and the number of unique parameter combinations is one, the following objects are returned:

  • If one-sample data with return_type = "list", a list:

    SlotNameDescription
    1One sample of simulated normal values.
  • If one-sample data with return_type = "data.frame", a data frame:

    ColumnNameDescription
    1itemPair/subject/item indicator.
    2valueSimulated normal values.
  • If two-sample data with return_type = "list", a list:

    SlotNameDescription
    1Simulated normal values from sample 1.
    2Simulated normal values from sample 2.
  • If two-sample data with return_type = "data.frame", a data frame:

    ColumnNameDescription
    1itemPair/subject/item indicator.
    2conditionTime/group/condition indicator.
    3valueSimulated normal values.

If nsims > 1 or the number of unique parameter combinations is greater than one, each object described above is returned in data frame, located in a list-column named data.

  • If one-sample data, a data frame:

    ColumnNameDescription
    1n1The sample size.
    2ratioGeometric mean [GM(sample 1)].
    3cv1Coefficient of variation for sample 1.
    4nsimsNumber of data simulations.
    5distributionDistribution sampled from.
    6dataList-column of simulated data.
  • If two-sample data, a data frame:

    ColumnNameDescription
    1n1Sample size of sample 1.
    2n2Sample size of sample 2.
    3ratioRatio of geometric means [GM(sample 2) / GM(sample 1)] or geometric mean ratio [GM(sample 2 / sample 1)].
    4cv1Coefficient of variation for sample 1.
    5cv2Coefficient of variation for sample 2.
    6corCorrelation between samples.
    7nsimsNumber of data simulations.
    8distributionDistribution sampled from.
    9dataList-column of simulated data.

Arguments

n1

(integer: [2, Inf))
The sample size(s) of sample 1.

n2

(integer: NULL; [2, Inf))
The sample size(s) of sample 2. Set as NULL if you want to simulate for the one-sample case.

ratio

(numeric: (0, Inf))
The assumed population fold change(s) of sample 2 with respect to sample 1.

  • For one-sample data, ratio is defined as the geometric mean (GM) of the original lognormal population distribution.

  • For dependent two-sample data, ratio is defined by GM(sample 2 / sample 1) of the original lognormal population distributions.

    • e.g. ratio = 2 assumes that the geometric mean of all paired ratios (sample 2 / sample 1) is 2.

  • For independent two-sample data, the ratio is defined by GM(group 2) / GM(group 1) of the original lognormal population distributions.

    • e.g. ratio = 2 assumes that the geometric mean of sample 2 is 2 times larger than the geometric mean of sample 1.

See 'Details' for additional information.

cv1

(numeric: (0, Inf))
The coefficient of variation(s) of sample 1 in the original lognormal data.

cv2

(numeric: NULL; (0, Inf))
The coefficient of variation(s) of sample 2 in the original lognormal data. Set as NULL if you want to simulate for the one-sample case.

cor

(numeric: 0; [-1, 1])
The correlation(s) between sample 1 and sample 2 in the original lognormal data. Not used for the one-sample case.

nsims

(Scalar integer: 1L; [1,Inf))
The number of simulated data sets. If nsims > 1, the data is returned in a list-column of a depower simulation data frame.

return_type

(string: "list"; c("list", "data.frame"))
The data structure of the simulated data. If "list" (default), a list object is returned. If "data.frame" a data frame in tall format is returned. The list object provides computational efficiency and the data frame object is convenient for formulas. See 'Value'.

ncores

(Scalar integer: 1L; [1,Inf))
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores(). May be helpful when the number of parameter combinations is large and nsims is large.

messages

(Scalar logical: TRUE)
Whether or not to display messages for pathological simulation cases.

Details

Based on assumed characteristics of the original lognormal distribution, data is simulated from the corresponding log-transformed (normal) distribution. This simulated data is suitable for assessing power of a hypothesis for the geometric mean or ratio of geometric means from the original lognormal data.

This method can also be useful for other population distributions which are positive and where it makes sense to describe the ratio of geometric means. However, the lognormal distribution is theoretically correct in the sense that you can log transform to a normal distribution, compute the summary statistic, then apply the inverse transformation to summarize on the original lognormal scale.

Let \(GM(\cdot)\) be the geometric mean and \(AM(\cdot)\) be the arithmetic mean. For independent lognormal samples \(X_1\) and \(X_2\)

$$\text{Fold Change} = \frac{GM(X_2)}{GM(X_1)}$$

For dependent lognormal samples \(X_1\) and \(X_2\)

$$\text{Fold Change} = GM\left( \frac{X_2}{X_1} \right)$$

Unlike ratios and the arithmetic mean, for equal sample sizes of \(X_1\) and \(X_2\) it follows that \(\frac{GM(X_2)}{GM(X_1)} = GM \left( \frac{X_2}{X_1} \right) = e^{AM(\ln X_2) - AM(\ln X_1)} = e^{AM(\ln X_2 - \ln X_1)}\).

The coefficient of variation (CV) for \(X\) is defined as

$$CV = \frac{SD(X)}{AM(X)}$$

The relationship between sample statistics for the original lognormal data (\(X\)) and the natural logged data (\(\ln{X}\)) are

$$ \begin{aligned} AM(X) &= e^{AM(\ln{X}) + \frac{Var(\ln{X})}{2}} \\ GM(X) &= e^{AM(\ln{X})} \\ Var(X) &= AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right) \\ CV(X) &= \frac{\sqrt{AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right)}}{AM(X)} \\ &= \sqrt{e^{Var(\ln{X})} - 1} \end{aligned} $$

and

$$ \begin{aligned} AM(\ln{X}) &= \ln \left( \frac{AM(X)}{\sqrt{CV(X)^2 + 1}} \right) \\ Var(\ln{X}) &= \ln(CV(X)^2 + 1) \\ Cor(\ln{X_1}, \ln{X_2}) &= \frac{\ln \left( Cor(X_1, X_2)CV(X_1)CV(X_2) + 1 \right)}{SD(\ln{X_1})SD(\ln{X_2})} \end{aligned} $$

Based on the properties of correlation and variance,

$$ \begin{aligned} Var(X_2 - X_1) &= Var(X_1) + Var(X_2) - 2Cov(X_1, X_2) \\ &= Var(X_1) + Var(X_2) - 2Cor(X_1, X_2)SD(X_1)SD(X_2) \\ SD(X_2 - X_1) &= \sqrt{Var(X_2 - X_1)} \end{aligned} $$

The standard deviation of the differences gets smaller the more positive the correlation and conversely gets larger the more negative the correlation. For the special case where the two samples are uncorrelated and each has the same variance, it follows that

$$ \begin{aligned} Var(X_2 - X_1) &= \sigma^2 + \sigma^2 \\ SD(X_2 - X_1) &= \sqrt{2}\sigma \end{aligned} $$

References

julious_2004depower

hauschke_1992depower

johnson_1994depower

See Also

stats::rnorm(), mvnfast::rmvn()

Examples

Run this code
#----------------------------------------------------------------------------
# sim_log_lognormal() examples
#----------------------------------------------------------------------------
library(depower)

# Independent two-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 1,
  return_type = "data.frame"
)

# Independent two-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 1,
  return_type = "list"
)

# Dependent two-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 1,
  return_type = "data.frame"
)

# Dependent two-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 1,
  return_type = "list"
)

# One-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 1,
  return_type = "data.frame"
)

# One-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 1,
  return_type = "list"
)

# Independent two-sample data: two simulations for four parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  n2 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 2,
  return_type = "list"
)

# Dependent two-sample data: two simulations for two parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  n2 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 2,
  return_type = "list"
)

# One-sample data: two simulations for two parameter combinations
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 2,
  return_type = "list"
)

Run the code above in your browser using DataLab