Learn R Programming

MIC (version 1.1.0)

genome_to_libsvm: Converts a genome to kmers stored in libsvm format on disk

Description

This function converts a single genome to a libsvm file containing kmer counts. The libsvm format will be as follows:


  label 1:count 2:count 3:count ...

Label is optional and defaults to 0. The kmer counts are indexed by the kmer index, which is the lexicographically sorted index of the kmer. Libsvm is a sparse format.

Usage

genome_to_libsvm(
  x,
  target_path,
  label = as.character(c("0")),
  k = 3L,
  canonical = TRUE,
  squeeze = FALSE,
  overwrite = FALSE
)

Value

boolean indicating success

Arguments

x

genome in string format

target_path

path to store libsvm file (.txt)

label

libsvm label

k

kmer length

canonical

only record canonical kmers (i.e., the lexicographically smaller of a kmer and its reverse complement)

squeeze

remove non-canonical kmers

overwrite

overwrite existing file

See Also

For multiple genomes in a directory, processed in parallel, see genomes_to_kmer_libsvm()

For more details on libsvm format, see https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html

Examples

Run this code
temp_libsvm_path <- tempfile(fileext = ".txt")
genome_to_libsvm("ATCGCAGT", temp_libsvm_path)
readLines(temp_libsvm_path)

Run the code above in your browser using DataLab