This function takes quality.scores, trims it and fits it to the distribution given. It then iteratively tests the largest datapoint compared a null distribution of size no.simulations. If the largest datapoint has a significant p-value it tests the 2nd largest one and so on. The function supports the following distributions:
'weibull'
'norm'
'gamma'
'exp'
'lnorm'
'cauchy'
'logis'
cosine.similarity.iterative(
quality.scores,
no.simulations,
distribution = c("lnorm", "weibull", "norm", "gamma", "exp", "cauchy", "logis"),
trim.factor = 0.05,
alpha.significant = 0.05
)Results in the form of a named list
Number of nominated outliers
Outlier IDs, corresponding to Sample column of quality.scores
A dataframe with columns 'Sum' (of scores) and 'Sample', i.e. the output of accumulate.zscores
The number of datasets to simulate
A distribution to test, will default to 'lnorm'
What fraction of values of each to trim to get parameters without using extremes
Alpha value for significance