cosine.similarity.iterative

This function takes quality.scores, trims it and fits it to the distribution given.
It then iteratively tests the largest datapoint compared a null distribution of size
no.simulations. If the largest datapoint has a significant p-value it tests the 2nd largest
one and so on. The function supports the following distributions:<ul>
<li>'weibull'</li>
<li>'norm'</li>
<li>'gamma'</li>
<li>'exp'</li>
<li>'lnorm'</li>
<li>'cauchy'</li>
<li>'logis'</li>
</ul>

A method that analyzes quality control metrics from multi-sample genomic sequencing studies and nominates poor quality samples for exclusion. Per sample quality control data are transformed into z-scores and aggregated. The distribution of aggregated z-scores are modelled using parametric distributions. The parameters of the optimal model, selected either by goodness-of-fit statistics or user-designation, are used for outlier nomination. Two implementations of the Cosine Similarity Outlier Detection algorithm are provided with flexible parameters for dataset customization.

Paul Boutros

OmicsQC

Nominating Quality Control Outliers in Genomic Profiling Studies

Anders Hugo Frelin

Helen Zhu

Paul C. Boutros

cosine.similarity.iterative function

<dl><dt>quality.scores</dt>
<dd>A dataframe with columns 'Sum' (of scores) and 'Sample', i.e. the output of accumulate.zscores</dd>
<dt>no.simulations</dt>
<dd>The number of datasets to simulate</dd>
<dt>distribution</dt>
<dd>A distribution to test, will default to 'lnorm'</dd>
<dt>trim.factor</dt>
<dd>What fraction of values of each to trim to get parameters without using extremes</dd>
<dt>alpha.significant</dt>
<dd>Alpha value for significance</dd></dl>

Arguments

Tests the accumulated quality scores for outliers using cosine similarity — cosine.similarity.iterative

<dl>

<dt>quality.scores</dt>
<dd>A dataframe with columns 'Sum' (of scores) and 'Sample', i.e. the output of accumulate.zscores</dd>


<dt>no.simulations</dt>
<dd>The number of datasets to simulate</dd>


<dt>distribution</dt>
<dd>A distribution to test, will default to 'lnorm'</dd>


<dt>trim.factor</dt>
<dd>What fraction of values of each to trim to get parameters without using extremes</dd>


<dt>alpha.significant</dt>
<dd>Alpha value for significance</dd>

</dl>

cosine.similarity.iterative: Tests the accumulated quality scores for outliers using cosine similarity

Description

Usage

Value

Arguments