saturated_init: Saturated 2SLS (split-sample initial estimator)

Description

saturated_init splits the sample into two sub-samples. The 2SLS model is estimated on both sub-samples and the estimates of one sub-sample are used to calculate the residuals and hence outliers from the other sub-sample.

Usage

saturated_init(data, formula, cutoff, shuffle, shuffle_seed, split = 0.5)

Value

saturated_init returns a list with five elements. The first four are vectors whose length equals the number of observations in the data set. Unlike the residuals stored in a model object (usually accessible via model$residuals), it does not ignore observations where any of y, x or z are missing. It instead sets their values to NA.

The first element is a double vector containing the residuals for each observation based on the model estimates. The second element contains the standardised residuals, the third one a logical vector with TRUE if the observation is judged as not outlying, FALSE if it is an outlier, and NA if any of y, x, or z are missing. The fourth element of the list is an integer vector with three values: 0 if the observations is judged to be an outlier, 1 if not, and -1 if missing. The fifth and last element is a list with the two initial ivreg model objects based on the two different sub-samples.

Arguments

data: A dataframe.
formula: A formula in the format y ~ x1 + x2 | x1 + z2 where y is the dependent variable, x1 are the exogenous regressors, x2 the endogenous regressors, and z2 the outside instruments.
cutoff: A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as an outlier.
shuffle: A logical value (TRUE or FALSE) whether the sample should be split into sub-samples randomly. If FALSE, the sample is simply cut into two parts using the original order of the supplied data set.
shuffle_seed: A numeric value that sets the seed for shuffling the data set before splitting it. Only used if shuffle == TRUE.
split: A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split.

Warning

The estimator may have bad properties if the split is too unequal and the sample size is not large enough.