Learn R Programming

quickOutlier (version 0.1.5)

diagnose_influence: Diagnose Influential Points in Linear Models (Cook's Distance)

Description

Fits a linear model between two variables and calculates Cook's Distance to identify influential points. An influential point is an outlier that specifically affects the slope of the regression line.

Usage

diagnose_influence(data, target, predictor, cutoff = NULL)

Value

A data frame with the original data plus:

Cooks_Dist

The calculated Cook's distance for each point.

Is_Influential

Logical flag. TRUE if Cooks_Dist > cutoff.

Arguments

data

A data frame containing the variables.

target

Character. The name of the dependent variable (Y).

predictor

Character. The name of the independent variable (X).

cutoff

Numeric (Optional). A custom threshold for Cook's Distance. If NULL, it defaults to 4/n.

Details

Cook's distance (\(D_i\)) measures the effect of deleting a given observation. Points with a large \(D_i\) are considered to have high leverage and influence.

The default threshold for detection is calculated as: $$Threshold = \frac{4}{n}$$ Where \(n\) is the number of observations. This is a standard rule of thumb in regression diagnostics.

Examples

Run this code
# Example: A point that pulls the regression line
df <- mtcars
# Artificially create a leverage point
df[1, "wt"] <- 10
df[1, "mpg"] <- 50
result <- diagnose_influence(df, "mpg", "wt")
head(result)

Run the code above in your browser using DataLab