Learn R Programming

MIC (version 1.1.0)

kmers: Generates genome kmers

Description

Generates genome kmers

Usage

kmers(
  x,
  k = 3L,
  simplify = FALSE,
  canonical = TRUE,
  squeeze = FALSE,
  anchor = TRUE,
  clean_up = TRUE,
  key_as_int = FALSE,
  starting_index = 1L
)

Value

list of kmer values, either as a list of a single vector (if simplify = TRUE), or as a named list containing "kmer_string" and "kmer_value".

Arguments

x

genome in string format

k

kmer length

simplify

returns a numeric vector of kmer counts, without associated string. This is useful to save memory, but should always be used with anchor = true.

canonical

only record canonical kmers (i.e., the lexicographically smaller of a kmer and its reverse complement)

squeeze

remove non-canonical kmers

anchor

includes unobserved kmers (with counts of 0). This is useful when generating a dense matrix where kmers of different genomes align.

clean_up

only include valid bases (ACTG) in kmer counts (excludes non-coding results such as N)

key_as_int

return kmer index (as "kmer_index") rather than the full kmer string. Useful for index-coded data structures such as libsvm.

starting_index

the starting index, only used if key_as_int = TRUE.

Examples

Run this code
kmers("ATCGCAGT")

Run the code above in your browser using DataLab