A histogram is plotted using ggplot2 to visualize the distribution of EE
rates. The user can adjust the number of bins in the histogram using the
n_bins parameter.
fastq_input can either be a file path to a FASTQ file or a FASTQ
object. FASTQ objects are tibbles that contain the columns Header,
Sequence, and Quality, see readFastq.
The EE rate is calculated as the sum of error probabilities per read, where
the error probability for each base is computed as \(10^{(-Q/10)}\) from
Phred scores. A lower EE rate indicates higher sequence quality, while a
higher EE rate suggests lower confidence in the read.
If fastq_input contains more than 10 000 reads, the function will
randomly select 10 000 rows for downstream calculations. This subsampling is
performed to reduce computation time and improve performance on large
datasets.