simulateVector: Simulate a Vector of Random Numbers From a Specified Theoretical or Empirical Probability Distribution

Description

Simulate a vector of random numbers from a specified theoretical probability distribution or empirical probability distribution, using either Latin Hypercube sampling or simple random sampling.

Usage

simulateVector(n, distribution = "norm", param.list = list(mean = 0, sd = 1), 
    sample.method = "SRS", seed = NULL, sorted = FALSE, 
    left.tail.cutoff = ifelse(is.finite(supp.min), 0, .Machine$double.eps), 
    right.tail.cutoff = ifelse(is.finite(supp.max), 0, .Machine$double.eps))

Arguments

a positive integer indicating the number of random numbers to generate.

distribution

a character string denoting the distribution abbreviation. The default value is distribution="norm". See the help file for Distribution.df for a list of possible distribution abbreviations.

Alternatively, the character string "emp" may be used to denote sampling from an empirical distribution based on a set of observations. The vector containing the observations is specified in the argument param.list.

param.list

a list with values for the parameters of the distribution. The default value is param.list=list(mean=0, sd=1). See the help file for Distribution.df for the names and possible values of the parameters associated with each distribution.

Alternatively, if you specify an empirical distribution by setting distribution="emp", then param.list must be a list of the form list(obs=name), where name denotes the name of the vector containing the observations to use for the empirical distribution. In this case, you may also supply arguments to the qemp function through param.list. For example, you may set param.list=list(obs=name, discrete=T) to specify an empirical distribution based on a discrete random variable.

sample.method

a character string indicating whether to use simple random sampling (sample.method="SRS", the default) or Latin Hypercube sampling (sample.method="LHS").

seed

integer to supply to the R function set.seed. The default value is seed=NULL, in which case the random seed is not set but instead based on the current value of .Random.seed.

sorted

logical scalar indicating whether to return the random numbers in sorted (ascending) order. The default value is sorted=FALSE.

left.tail.cutoff

a scalar between 0 and 1 indicating what proportion of the left-tail of the probability distribution to omit for Latin Hypercube sampling. For densities with a finite support minimum (e.g., Lognormal or Empirical) the default value is left.tail.cutoff=0; for densities with a support minimum of $- \infty$ , the default value is left.tail.cutoff=.Machine$double.eps. This argument is ignored if sample.method="SRS".

right.tail.cutoff

a scalar between 0 and 1 indicating what proportion of the right-tail of the probability distribution to omit for Latin Hypercube sampling. For densities with a finite support maximum (e.g., Beta or Empirical) the default value is right.tail.cutoff=0; for densities with a support maximum of $\infty$ , the default value is right.tail.cutoff=.Machine$double.eps. This argument is ignored if sample.method="SRS".

Value

a numeric vector of random numbers from the specified distribution.

Details

Simple Random Sampling (sample.method="SRS") When sample.method="SRS", the function simulateVector simply calls the function rabb, where abb denotes the abbreviation of the specified distribution (e.g., rlnorm, remp, etc.).

Latin Hypercube Sampling (sample.method="LHS") When sample.method="LHS", the function simulateVector generates n random numbers using Latin Hypercube sampling. The distribution is divided into n intervals of equal probability $1 / n$ and simple random sampling is performed once within each interval; i.e., Latin Hypercube sampling is simply stratified sampling without replacement, where the strata are defined by the 0'th, 100(1/n)'th, 100(2/n)'th, ..., and 100'th percentiles of the distribution.

Latin Hypercube sampling, sometimes abbreviated LHS, is a method of sampling from a probability distribution that ensures all portions of the probability distribution are represented in the sample. It was introduced in the published literature by McKay et al. (1979) to overcome the following problem in Monte Carlo simulation based on simple random sampling (SRS). Suppose we want to generate random numbers from a specified distribution. If we use simple random sampling, there is a low probability of getting very many observations in an area of low probability of the distribution. For example, if we generate $n$ observations from the distribution, the probability that none of these observations falls into the upper 98'th percentile of the distribution is ${0.98}^{n}$ . So, for example, there is a 13% chance that out of 100 random numbers, none will fall at or above the 98'th percentile. If we are interested in reproducing the shape of the distribution, we will need a very large number of observations to ensure that we can adequately characterize the tails of the distribution (Vose, 2008, pp. 59--62).

See Millard (2013) for a visual explanation of Latin Hypercube sampling.

References

Iman, R.L., and W.J. Conover. (1980). Small Sample Sensitivity Analysis Techniques for Computer Models, With an Application to Risk Assessment (with Comments). Communications in Statistics--Volume A, Theory and Methods, 9(17), 1749--1874.

Iman, R.L., and J.C. Helton. (1988). An Investigation of Uncertainty and Sensitivity Analysis Techniques for Computer Models. Risk Analysis 8(1), 71--90.

Iman, R.L. and J.C. Helton. (1991). The Repeatability of Uncertainty and Sensitivity Analyses for Complex Probabilistic Risk Assessments. Risk Analysis 11(4), 591--606.

McKay, M.D., R.J. Beckman., and W.J. Conover. (1979). A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code. Technometrics 21(2), 239--245.

Millard, S.P. (2013). EnvStats: an R Package for Environmental Statistics. Springer, New York. http://www.springer.com/book/9781461484554.

Vose, D. (2008). Risk Analysis: A Quantitative Guide. Third Edition. John Wiley & Sons, West Sussex, UK, 752 pp.

Examples

Run this code

# NOT RUN {
  # Generate 10 observations from a lognormal distribution with 
  # parameters mean=10 and cv=1 using simple random sampling:

  simulateVector(10, distribution = "lnormAlt", 
    param.list = list(mean = 10, cv = 1), seed = 47, 
    sort = TRUE) 
  # [1]  2.086931  2.863589  3.112866  5.592502  5.732602  7.160707
  # [7]  7.741327  8.251306 12.782493 37.214748

  #----------

  # Repeat the above example by calling rlnormAlt directly:

  set.seed(47) 
  sort(rlnormAlt(10, mean = 10, cv = 1))
  # [1]  2.086931  2.863589  3.112866  5.592502  5.732602  7.160707
  # [7]  7.741327  8.251306 12.782493 37.214748

  #----------

  # Now generate 10 observations from the same lognormal distribution 
  # but use Latin Hypercube sampling.  Note that the largest value
  # is larger than for simple random sampling:

  simulateVector(10, distribution = "lnormAlt", 
    param.list = list(mean = 10, cv = 1), seed = 47,
    sample.method = "LHS", sort = TRUE) 
  # [1]  2.406149  2.848428  4.311175  5.510171  6.467852  8.174608
  # [7]  9.506874 12.298185 17.022151 53.552699

  #==========

  # Generate 50 observations from a Pareto distribution with parameters 
  # location=10 and shape=2, then use this resulting vector of 
  # observations as the basis for generating 3 observations from an 
  # empirical distribution using Latin Hypercube sampling:

  set.seed(321) 
  pareto.rns <- rpareto(50, location = 10, shape = 2)
 
  simulateVector(3, distribution = "emp", 
    param.list = list(obs = pareto.rns), sample.method = "LHS") 
  #[1] 11.50685 13.50962 17.47335

  #==========

  # Clean up
  #---------
  rm(pareto.rns)
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025