Filtering parameters should be adjusted according to the sample size
of the experiment data and the number of replicates per condition.min_samps_gene_expr
defines the minimal number of samples where genes
are required to be expressed at the minimal level of min_gene_expr
in
order to be included in the downstream analysis. Ideally, we would like that
genes were expressed at some minimal level in all samples because this would
lead to good estimates of feature ratios.
Similarly, min_samps_feature_expr
and min_samps_feature_prop
defines the minimal number of samples where features are required to be
expressed at the minimal levels of counts min_feature_expr
or
proportions min_feature_prop
. In differential splicing analysis, we
suggest using min_samps_feature_expr
and min_samps_feature_prop
equal to the minimal number of replicates in any of the conditions. For
example, in an assay with 3 versus 5 replicates, we would set these
parameters to 3, which allows a situation where a feature is expressed in one
condition but may not be expressed at all in another one, which is an example
of differential splicing.
By default, we do not use filtering based on feature proportions. Therefore,
min_samps_feature_prop
and min_feature_prop
equals 0.
In sQTL analysis, usually, we deal with data that has many more replicates
than data from a standard differential splicing assay. Our example data set
consists of 91 samples. Requiring that genes are expressed in all samples may
be too stringent, especially since there may be missing values in the data
and for some genes you may not observe counts in all 91 samples. Slightly
lower threshold ensures that we do not eliminate such genes. For example, if
min_samps_gene_expr = 70
and min_gene_expr = 10
, only genes
with expression of at least 10 in at least 70 samples are kept. Samples with
expression lower than 10 have NA
s assigned and are skipped in the
analysis of this gene. minor_allele_freq
indicates the minimal number
of samples for the minor allele presence. Usually, it is equal to 5% of
total samples.