This function reports basic statistics for a genome assembly.
Usage
AnalyzeAssembly(genome, max_N = 25, plot = F)
Arguments
genome
a list of vectors with each element being a single string of the class "SeqFastadna".
max_N
Maximum number of consecutive N symbols. Scaffolds will be broken into contigs
when this number is exceeded.
plot
When True an accumulation plot will be returned as well as the statistics
Value
A dataframe with the following rows:
Number of Scaffolds
Assembly Size Based on Scaffolds
Number of Scaffolds over 1MB
N50 Scaffold Size
Number of Contigs
Assembly Size Based on Contigs
N50 Contig Size
Minimum Contig Size
Percent GC
Details
If a standard FASTA file is read in with the function read.fasta from the package seqinr the argument as.string should set to TRUE. The genome should also be all lower case which is the default setting for read.fasta.