Learn R Programming

GencoDymo2 (version 1.0.2)

find_cryptic_splice_sites: Identify Potential Cryptic Splice Sites.

Description

This function identifies potential cryptic splice sites by comparing sequence motifs in introns to canonical splice site motifs (donor and acceptor). Cryptic splice sites are those that do not match the canonical donor (GT) or acceptor motifs (AG). It compares the identified splice sites with the provided canonical motifs and flags the sites that differ from the canonical patterns, making it useful for studying aberrant splicing events.

Usage

find_cryptic_splice_sites(input, genome, canonical_donor, canonical_acceptor, verbose)

Value

The input data frame with two logical columns:

  • cryptic_donor: TRUE if donor site is non-canonical.

  • cryptic_acceptor: TRUE if acceptor site is non-canonical.

Arguments

input

A data frame containing intron coordinates, ideally generated by extract_introns() and assign_splice_sites(). Must contain columns: seqnames, intron_start, intron_end, strand, transcript_id, intron_number, gene_name, gene_id, donor_ss and acceptor_ss.

genome

A BSgenome object representing the genome sequence. This is used to extract the sequence for each intron to identify splice sites.

canonical_donor

A character vector of canonical donor splice site motifs. Default is c("GT").

canonical_acceptor

A character vector of canonical acceptor splice site motifs. Default is c("AG").

verbose

Logical; if TRUE, progress messages are printed. Default is TRUE.

Details

This function performs the following steps:

  • It assigns donor and acceptor splice sites to each intron using the assign_splice_sites function.

  • It compares the identified donor and acceptor splice sites against the provided canonical motifs (GT for donor and AG for acceptor by default). If the splice site sequences do not match the canonical motifs, they are flagged as cryptic.

  • The function returns a data frame with the same intron information, including additional columns cryptic_donor and cryptic_acceptor indicating whether the splice sites are cryptic.

  • The progress of the function is printed if the verbose argument is set to TRUE, showing also the total number of cryptic donor and acceptor sites and their respective percentages.

See Also

assign_splice_sites, extract_ss_motif

Examples

Run this code
if (FALSE) {
  if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
    file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
    gtf_v1 <- load_file(file_v1)
    introns_df <- extract_introns(gtf_v1)
    introns_ss <- assign_splice_sites(introns_df, genome = BSgenome.Hsapiens.UCSC.hg38)
    cryptic_sites <- find_cryptic_splice_sites(introns_ss, BSgenome.Hsapiens.UCSC.hg38)
    head(cryptic_sites)
  }
}


Run the code above in your browser using DataLab