WARNING: not for average user - function not completed yet
mc_grid(
M,
n,
seed,
parameters,
formula,
ref_dist,
sign_level,
initial_est,
iterations,
convergence_criterion = NULL,
max_iter = NULL,
shuffle = FALSE,
shuffle_seed = 10,
split = 0.5,
path = FALSE,
verbose = FALSE
)
mc_grid
returns a data frame with the results of the Monte
Carlo experiments. Each row corresponds to a specific simulation setup. The
columns record the simulation setup and its results. Currently, the average
proportion of detected outliers ("mean_gauge") and their variance
("var_gauge") are being recorded. Moreover, the theoretical asymptotic
variance ("avar") and the ratio of simulated to theoretical variance -
adjusted by the sample size - are calculated ("var_ratio"). Furthermore,
tentative results of size and power for the tests are calculated.
Number of replications.
Sample size for each replication.
Random seed for the iterations.
A list as created by generate_param that specifies the true model.
A formula that specifies the 2SLS model to be estimated. The
format has to follow y ~ x1 + x2 | x1 + z2
, where y
is the
dependent variable, x1
are the exogenous regressors, x2
the
endogenous regressors, and z2
the outside instruments.
A character vector that specifies the reference distribution
against which observations are classified as outliers. "normal"
refers
to the normal distribution.
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not.
A character vector that specifies the initial estimator
for the outlier detection algorithm. "robustified"
means that the full
sample 2SLS is used as initial estimator. "saturated"
splits the
sample into two parts and estimates a 2SLS on each subsample. The
coefficients of one subsample are used to calculate residuals and determine
outliers in the other subsample. "user"
allows the user to specify a
model based on which observations are classified as outliers.
An integer >= 0 that specifies how often the outlier
detection algorithm is iterated and for which summary statistics will be
calculated. The value 0
means that outlier classification based on the
initial estimator is done. Alternatively, the character "convergence"
for iteration until convergence.
A numeric value that determines whether the
algorithm has converged as measured by the L2 norm of the difference in
coefficients between the current and the previous iteration. Only used when
argument iterations
is set to "convergence"
.
A numeric value >= 1 or NULL. If
iterations = "convergence"
is chosen, then the algorithm is stopped
after at most max_iter
iterations. If also a
convergence_criterion
is chosen then the algorithm stops when either
the criterion is fulfilled or the maximum number of iterations is reached.
A logical value or NULL
.
initial_est == "saturated"
. If TRUE
then the sample is shuffled
before creating the subsamples.
An integer value that will set the seed for shuffling the
sample or NULL
. Only used if initial_est == "saturated"
and
shuffle == TRUE
.
A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split.
A character string or FALSE
. The simulation grid can save
the individual results of each different entry in the grid to this
location. Individual results not saved if argument set to FALSE
.
A logical value whether any messages should be printed.
Requires the package doRNG to be installed, which has been orphaned as of 2022-12-09.
The following arguments can also be supplied as a vector of their type:
n
, sign_level
, initial_est
, and split
. This makes
the function estimate all possible combinations of the arguments. Note that
the initial estimator "robustified"
is not affected by the argument
split
and hence is not varied in this case.
For example, specifying n = c(100, 1000)
and
sign_level = c(0.01, 0.05)
estimates four Monte Carlo experiments with
the four possible combinations of the parameters.
The path
argument allows users to store the M
replication
results for all of the individual Monte Carlo simulations that are part of
the grid. The results are saved both as .Rds
and .csv
files.
The file name is indicative of the simulation setting.