Learn R Programming

hidetify (version 0.0.1)

shidetify: Single detection asymmetric influential measure for high dimensional linear regression.

Description

The function computes the asymmetric influential measure to identify influential observations in high dimensional linear regression using the single detection approach.

Usage

shidetify(
  x, 
  y, 
  asymvec, 
  alpha
  )

Arguments

x

Matrix of the predictors.

y

Numeric vector of the response variable.

asymvec

Numeric vector of the asymmetric values. It is suggested to choose 3 asymetric points within the quartile.

alpha

Significance level.

Value

A dataframe with two variables.

ind

Index of the observations

outlier_ind

Influential observations indicator: 1 if influential and 0 otherwise

References

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.

Examples

Run this code
# NOT RUN {
## Simulate a dataset where the first 10 observations are influentials
require("MASS")
# the vector of asymetric point
asymvec  <- c(0.25,0.5,0.75)

# the parameter of interest
beta_param <- c(3,1.5,0,0,2,rep(0,1000-5))

# the contamination parameter 
gama_param <- c(0,0,1,1,0,rep(1,1000-5))

# Covariance matrice for the predictors distribution 
sigmain <- diag(rep(1,1000))
for (i in 1:1000)
{
  for (j in i:1000) 
  {
    sigmain[i,j] <- 0.5^(abs(j-i))
    sigmain[j,i] <- sigmain[i,j]
  }
}

# set the seed
set.seed(13)

# the predictor matrix
x  <- mvrnorm(100, rep(0, 1000), sigmain)

# the error variable
error_var <- rnorm(100)

# the response variable
y  <- x %*% beta_param + error_var
y <- as.numeric(y)

### Generate influential observations
# the contaminated response variable
youtlier <- y
youtlier[1:10] <- x[1:10,] %*% (beta_param +  1.2*gama_param)  + error_var[1:10]
youtlier <- as.numeric(youtlier)

# the significance level 
alpha <- 0.05

df_single_influential <- 
  shidetify(
    x, 
    youtlier, 
    asymvec, 
    alpha)
# }

Run the code above in your browser using DataLab