bayesProb: Mean posterior probability for each observation in Baysian quantile regression model

Description

Mean posterior probability for each observation in Baysian quantile regression model

Usage

bayesProb(y, x, tau, M, burn)

Arguments

vector, dependent variable in quantile regression

matrix, design matrix in quantile regression

tau

quantile

MCMC draws

burn

burned MCMC draws

Details

If we define the variable O_i, which takes value equal to 1 when ith observation is an outlier, and 0 otherwise, then we propose to calculate the probability of an observation being an outlier as:

$$P(O_{i} = 1) = \frac{1}{n-1}\sum{P(v_{i}>v_{j}|data)} \quad (1)$$ We believe that for points, which are not outliers, this probability should be small, possibly close to zero. Given the natrual ordering of the residuals, it is expected that some observations present greater values for this probability in comparison to others. What we think that should be deemed as an outlier, ought to be those observations with a higher $P(O_{i} = 1)$, and possibly one that is particularly distant from the others.

The probability in the equation can be approximated given the MCMC draws, as follows

$$P(O_{i}=1)=\frac{1}{M}\sum{I(v^{(l)}_{i}>max v^{k}_{j})}$$

where $M$ is the size of the chain of $v_{i}$ after the burn-in period and $v^{(l)}_{j}$ is the $l$th draw of chain.

More details please refer to the paper in references

References

Santos B, Bolfarine H.(2016)``On Baysian quantile regression and outliers,arXiv:1601.07344

Examples

Run this code

# NOT RUN {
ais_female <- subset(ais, Sex == 1)
y <- ais_female$BMI
x <- cbind(1, ais_female$LBM)
tau <- 0.5
M <- 5000
burn <- 1000
prob <- bayesProb(y, x, tau, M, burn)
case <-  1:100
dat <- data.frame(case, prob)
ggplot(dat, aes(case, prob))+
 geom_point() +
 geom_text(data = subset(dat, prob > mean(prob) + 2*sd(prob)),
# }
# NOT RUN {


# }