saturated_init
splits the sample into two sub-samples. The 2SLS model
is estimated on both sub-samples and the estimates of one sub-sample are
used to calculate the residuals and hence outliers from the other sub-sample.
saturated_init(data, formula, cutoff, shuffle, shuffle_seed, split = 0.5)
saturated_init
returns a list with five elements. The first
four are vectors whose length equals the number of observations in the data
set. Unlike the residuals stored in a model object (usually accessible via
model$residuals
), it does not ignore observations where any of y, x
or z are missing. It instead sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
is a list with the two initial ivreg
model objects based
on the two different sub-samples.
A dataframe.
A formula in the format y ~ x1 + x2 | x1 + z2
where
y
is the dependent variable, x1
are the exogenous regressors,
x2
the endogenous regressors, and z2
the outside instruments.
A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as an outlier.
A logical value (TRUE
or FALSE
) whether the
sample should be split into sub-samples randomly. If FALSE
, the sample
is simply cut into two parts using the original order of the supplied data
set.
A numeric value that sets the seed for shuffling the
data set before splitting it. Only used if shuffle == TRUE
.
A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split.
The estimator may have bad properties if the split
is too unequal and
the sample size is not large enough.