Generates genome kmers
kmers(
x,
k = 3L,
simplify = FALSE,
canonical = TRUE,
squeeze = FALSE,
anchor = TRUE,
clean_up = TRUE,
key_as_int = FALSE,
starting_index = 1L
)
list of kmer values, either as a list of a single vector (if simplify = TRUE), or as a named list containing "kmer_string" and "kmer_value".
genome in string format
kmer length
returns a numeric vector of kmer counts, without associated string. This is useful to save memory, but should always be used with anchor = true.
only record canonical kmers (i.e., the lexicographically smaller of a kmer and its reverse complement)
remove non-canonical kmers
includes unobserved kmers (with counts of 0). This is useful when generating a dense matrix where kmers of different genomes align.
only include valid bases (ACTG) in kmer counts (excludes non-coding results such as N)
return kmer index (as "kmer_index") rather than the full kmer string. Useful for index-coded data structures such as libsvm.
the starting index, only used if key_as_int = TRUE.