Learn R Programming

devianLM (version 1.0.7)

devianlm_stats: Identify outliers using devianLM method

Description

Identify outliers using devianLM method

Usage

devianlm_stats(
  y,
  x,
  threshold = NULL,
  n_sims = 50000,
  nthreads = detectCores() - 1,
  quant = 0.95,
  ...
)

Value

devianlm returns an object of class list with the following components:

reg_residuals

Numeric vector. The studentized residuals from the linear model.

outliers

Integer vector. The indices (positions in the original data) of observations identified as outliers based on the threshold.

threshold

Numeric value. The cutoff applied to the absolute value of the studentized residuals to flag outliers. If not provided, it is estimated using get_devianlm_threshold().

is_outliers

Integer vector. A binary vector (0 or 1) of the same length as reg_residuals, indicating whether each observation is considered an outlier (1) or not (0).

Arguments

y

a numeric variable

x

either a numeric variable or several numeric variables (explanatory variables) concatenated in a data frame. **Note:** `devianLM` does not add an intercept automatically; include a column of ones in `x` if an intercept is desired.

threshold

numeric or NULL; if NULL, computed using devianlm_cpp()

n_sims

optional value which is the number of simulations, is set to 50.000 by default.

nthreads

optional value which is the number of CPU cores to use, is set to "number of CPU cores - 1" by default.

quant

quantile of interest, is set to 0.95 by default (this corresponds to a risk level of 0.05).

...

additional arguments for get_devianlm_threshold()

Examples

Run this code
set.seed(123)
y <- salary$hourly_earnings_log
x <- cbind(1, salary$age, salary$educational_attainment, salary$children_number)

test_salary <- devianlm_stats(y, x, n_sims = 100, quant = 0.95)

plot(test_salary$reg_residuals,
  pch = 16, cex = .8,
  ylim = c(-1 * max(abs(test_salary$reg_residuals)), max(abs(test_salary$reg_residuals))),
  xlab = "", ylab = "Studentized residuals",
  col = ifelse(test_salary$is_outliers, "red", "black"))

# Add the thresholds lines:
abline(h = c(-test_salary$threshold, test_salary$threshold), col = "chartreuse2", lwd = 2)
 

Run the code above in your browser using DataLab