This function serves as an all-in-one interface for various genomic data analyses leveraging k-mer based techniques.
kmeRtone(
case.coor.path,
genome.name,
strand.sensitive,
k,
ctrl.rel.pos = c(80, 500),
case.pattern,
output.dir = "output/",
case,
genome,
control,
control.path,
genome.path,
rm.case.kmer.overlaps,
single.case.len,
merge.replicates,
kmer.table,
module = "score",
rm.dup = TRUE,
case.coor.1st.idx = 1,
ctrl.coor.1st.idx = 1,
coor.load.limit = 1,
genome.load.limit = 1,
genome.fasta.style = "UCSC",
genome.ncbi.db = "refseq",
use.UCSC.chr.name = FALSE,
verbose = TRUE,
kmer.cutoff = 5,
selected.extremophiles,
other.extremophiles,
cosmic.username,
cosmic.password,
tumour.type.regex = NULL,
tumour.type.exact = NULL,
cell.type = "somatic",
genic.elements.counts.dt,
population.size = 1e+06,
selected.genes,
add.to.existing.population = FALSE,
population.snv.dt = NULL,
pop.plot = TRUE,
pop.loop.chr = FALSE
)
Depends on the selected module.
Path to a folder containing chromosome-separated coordinate files or bedfiles. Assumed replicates for subfolder or bedfiles.
Name of the genome (e.g., "hg19", "hg38"). Default is "unknown".
Logical value indicating whether strand polarity matters. Default is TRUE.
Length of k-mer to be investigated. Recommended values are 7 or 8.
A vector of two integers specifying the relative range positions of control regions.
Regular expression pattern for identifying case regions. Default is NULL.
Directory path for saving output files. Default is "output/".
Optional pre-built Coordinate object.
Optional pre-built Genome object.
Optional pre-built control Coordinate object.
Path for pre-built control Coordinate object.
Path to a directory of user-provided genome FASTA files.
Logical indicating whether to remove overlapping k-mers in case regions. Default is FALSE.
Integer indicating uniform length of case regions.
Logical indicating whether to merge replicates. Default is TRUE.
Pre-calculated k-mer score table.
Selected kmeRtone module to run. Possible values include "score", "explore", "tune", among others.
Logical indicating whether to remove duplicate coordinates. Default is TRUE.
Integer specifying indexing format for case coordinates.
Integer specifying indexing format for control coordinates.
Maximum number of coordinates to load. Default is 1.
Maximum number of genome sequences to load. Default is 1.
String specifying the style of the genome FASTA. Possible values are "UCSC", "NCBI". Default is "UCSC".
String specifying the NCBI database to use. Possible values are "refseq", "genbank". Default is "refseq".
Logical indicating whether to use UCSC chromosome names.
Logical indicating whether to display progress messages. Default is TRUE.
Cutoff percentage for k-mer selection in case studies. Default is 5.
Vector of selected extremophile species for study.
Vector of other extremophile species for control.
COSMIC username for accessing the cancer gene census.
COSMIC password for accessing the cancer gene census.
Regular expression pattern for filtering tumour types.
Exact tumour type to be included in the cancer gene census.
Cell type to be included in the cancer gene census. Default is "somatic".
Data table of susceptible k-mer counts in genic elements.
Size of the population for cross-population studies. Default is 1 million.
Selected genes for mutation in cross-population studies.
Logical indicating whether to add to the existing simulated population. Default is FALSE.
Data table of single nucleotide variants used in population simulations.
Logical indicating whether to plot the outcome of the cross-population study. Default is TRUE.
Logical indicating whether to loop based on chromosome name in cross-population studies. Default is FALSE.