Learn R Programming

kmeRtone (version 1.0)

SCORE: Calculate susceptibility scores for k-mers in case and control regions.

Description

Function calculates susceptibility scores for k-mers in case and control regions. Case regions are defined by genomic coordinates provided in a file or data.table. Control regions can be constructed relative to the case regions or provided directly. The scores are computed based on the occurrence of k-mers in case and control regions.

Usage

SCORE(
  case.coor.path,
  genome.name,
  strand.sensitive,
  k,
  ctrl.rel.pos,
  case.pattern,
  output.path,
  case,
  genome,
  control,
  control.path,
  genome.path,
  rm.case.kmer.overlaps,
  single.case.len,
  merge.replicates,
  rm.dup,
  case.coor.1st.idx,
  ctrl.coor.1st.idx,
  coor.load.limit,
  genome.load.limit,
  genome.fasta.style,
  genome.ncbi.db,
  use.UCSC.chr.name,
  verbose
)

Value

Data.table containing susceptibility scores for k-mers.

Arguments

case.coor.path

Path to the file containing genomic coordinates of case regions.

genome.name

Name of the genome to be used.

strand.sensitive

Logical indicating whether strand information should be considered.

k

Integer size of the expanded k-mer.

ctrl.rel.pos

Relative positions of control regions with respect to case regions. It should be a vector of two integers indicating the upstream and downstream distances from the case regions.

case.pattern

Regular expression pattern to identify the central sequence in case regions.

output.path

Directory path where the output files will be saved.

case

Data.table containing the genomic coordinates of case regions.

genome

Genome data.table containing the genomic sequence information.

control

Data.table containing the genomic coordinates of control regions.

control.path

Path to the file containing genomic coordinates of control regions (optional).

genome.path

Path to the genome FASTA file.

rm.case.kmer.overlaps

Logical indicating whether overlapping k-mers within case regions should be removed.

single.case.len

Single case length.

merge.replicates

Logical indicating whether replicates should be merged.

rm.dup

Logical indicating whether duplicate k-mers should be removed.

case.coor.1st.idx

First index in the case coordinate file.

ctrl.coor.1st.idx

First index in the control coordinate file.

coor.load.limit

Maximum number of coordinates to load.

genome.load.limit

Maximum number of genome sequences to load.

genome.fasta.style

FASTA style.

genome.ncbi.db

NCBI database.

use.UCSC.chr.name

Logical indicating whether to use UCSC chromosome names.

verbose

Logical indicating whether to display progress messages.