Learn R Programming

misha (version 5.4.0)

gseq.kmer.dist: Compute k-mer distribution in genomic intervals

Description

Counts the occurrence of all k-mers (of size k) within the specified genomic intervals, optionally excluding masked regions.

Usage

gseq.kmer.dist(intervals, k = 6L, mask = NULL)

Value

A data frame with columns:

kmer

Character string representing the k-mer sequence

count

Number of occurrences of this k-mer

Only k-mers with count > 0 are included. K-mers containing N bases are not counted.

Arguments

intervals

Genomic intervals to analyze

k

Integer k-mer size (1-10). Default is 6.

mask

Optional intervals to exclude from counting. Positions within the mask will not contribute to k-mer counts.

See Also

gseq.extract, gseq.kmer

Examples

Run this code
gdb.init_examples()

# Count all 6-mers in first 10kb of chr1
intervals <- data.frame(chrom = "chr1", start = 0, end = 10000)
kmer_dist <- gseq.kmer.dist(intervals, k = 6)
head(kmer_dist)

# Count dinucleotides
dinucs <- gseq.kmer.dist(intervals, k = 2)
dinucs

# Count with mask
mask <- data.frame(chrom = "chr1", start = 5000, end = 6000)
kmer_dist_masked <- gseq.kmer.dist(intervals, k = 6, mask = mask)

Run the code above in your browser using DataLab