Learn R Programming

kmeRtone (version 1.0)

STUDY_ACROSS_POPULATIONS: Study k-mer composition of selected COSMIC causal cancer genes across human populations worldwide.

Description

Simulation of human population is based on single nucleotide variantion.

Usage

STUDY_ACROSS_POPULATIONS(
  kmer.table,
  kmer.cutoff = 5,
  genome.name,
  k,
  db = "refseq",
  central.pattern = NULL,
  population.size = 1e+06,
  selected.genes,
  add.to.existing.population = FALSE,
  output.dir = "study_across_populations/",
  population.snv.dt = NULL,
  loop.chr = TRUE,
  plot = FALSE,
  fasta.path
)

Value

An output directory containing plots.

Arguments

kmer.table

A data.table of kmer table.

kmer.cutoff

Percentage of extreme kmers to study. Default to 5.

genome.name

UCSC genome name.

k

K-mer size.

db

Database used by UCSC to generate gene prediction: "refseq" or "gencode". Default is "refseq".

central.pattern

K-mer's central patterns. Default is NULL.

population.size

Size of population to simulate. Default is 1 million.

selected.genes

Set of genes to study e.g. skin cancer genes.

add.to.existing.population

Add counts to counts.csv? Default is FALSE.

output.dir

A directory for the outputs. Default to study_across_populations.

population.snv.dt

Population SNV table.

loop.chr

Loop chromosome?. Default is TRUE. If FALSE, beware of a memory spike because of VCF content. VCF contains zero counts for every population. Input pre-computed trimmed-version population.snv.dt.

plot

Boolean. Default is FALSE. If TRUE, will plot results.

fasta.path

Path to a directory of user-provided genome FASTA files or the destination to save the NCBI/UCSC downloaded reference genome files.