mc_grid: Monte Carlo simulations parameter grid

Description

WARNING: not for average user - function not completed yet

Usage

mc_grid(
  M,
  n,
  seed,
  parameters,
  formula,
  ref_dist,
  sign_level,
  initial_est,
  iterations,
  convergence_criterion = NULL,
  max_iter = NULL,
  shuffle = FALSE,
  shuffle_seed = 10,
  split = 0.5,
  path = FALSE,
  verbose = FALSE
)

Value

mc_grid returns a data frame with the results of the Monte Carlo experiments. Each row corresponds to a specific simulation setup. The columns record the simulation setup and its results. Currently, the average proportion of detected outliers ("mean_gauge") and their variance ("var_gauge") are being recorded. Moreover, the theoretical asymptotic variance ("avar") and the ratio of simulated to theoretical variance - adjusted by the sample size - are calculated ("var_ratio"). Furthermore, tentative results of size and power for the tests are calculated.

Arguments

M: Number of replications.
n: Sample size for each replication.
seed: Random seed for the iterations.
parameters: A list as created by generate_param that specifies the true model.
formula: A formula that specifies the 2SLS model to be estimated. The format has to follow y ~ x1 + x2 | x1 + z2, where y is the dependent variable, x1 are the exogenous regressors, x2 the endogenous regressors, and z2 the outside instruments.
ref_dist: A character vector that specifies the reference distribution against which observations are classified as outliers. "normal" refers to the normal distribution.
sign_level: A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not.
initial_est: A character vector that specifies the initial estimator for the outlier detection algorithm. "robustified" means that the full sample 2SLS is used as initial estimator. "saturated" splits the sample into two parts and estimates a 2SLS on each subsample. The coefficients of one subsample are used to calculate residuals and determine outliers in the other subsample. "user" allows the user to specify a model based on which observations are classified as outliers.
iterations: An integer >= 0 that specifies how often the outlier detection algorithm is iterated and for which summary statistics will be calculated. The value 0 means that outlier classification based on the initial estimator is done. Alternatively, the character "convergence" for iteration until convergence.
convergence_criterion: A numeric value that determines whether the algorithm has converged as measured by the L2 norm of the difference in coefficients between the current and the previous iteration. Only used when argument iterations is set to "convergence".
max_iter: A numeric value >= 1 or NULL. If iterations = "convergence" is chosen, then the algorithm is stopped after at most max_iter iterations. If also a convergence_criterion is chosen then the algorithm stops when either the criterion is fulfilled or the maximum number of iterations is reached.
shuffle: A logical value or NULL. initial_est == "saturated". If TRUE then the sample is shuffled before creating the subsamples.
shuffle_seed: An integer value that will set the seed for shuffling the sample or NULL. Only used if initial_est == "saturated" and shuffle == TRUE.
split: A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split.
path: A character string or FALSE. The simulation grid can save the individual results of each different entry in the grid to this location. Individual results not saved if argument set to FALSE.
verbose: A logical value whether any messages should be printed.

Details

Requires the package doRNG to be installed, which has been orphaned as of 2022-12-09.

The following arguments can also be supplied as a vector of their type: n, sign_level, initial_est, and split. This makes the function estimate all possible combinations of the arguments. Note that the initial estimator "robustified" is not affected by the argument split and hence is not varied in this case.

For example, specifying n = c(100, 1000) and sign_level = c(0.01, 0.05) estimates four Monte Carlo experiments with the four possible combinations of the parameters.

The path argument allows users to store the M replication results for all of the individual Monte Carlo simulations that are part of the grid. The results are saved both as .Rds and .csv files. The file name is indicative of the simulation setting.