Learn R Programming

gclink (version 1.1)

gc_cluster: Identify Breakpoints of Gene Clusters within a Contig

Description

Internal helper used by gc_cal. Given the ordered positions of reference genes on a contig, this function returns the genomic coordinates that mark the boundary of each candidate cluster. A boundary is declared whenever the gap between two successive reference genes exceeds the maximum spacing allowed by the cluster definition (AllGeneNum - MinConSeq).

Usage

gc_cluster(Data = orf_position.tmp, AllGeneNum = 30, MinConSeq = 15)

Value

A numeric vector containing every position that marks the end

of a putative gene cluster. These values are subsequently used as breakpoints to slice the contig into candidate regions in the downstream functions gc_position() and gc_range().

Arguments

Data

A numeric vector (ascending order) of ORF positions that carry one of the reference genes of interest. Usually the vector orf_position.tmp created inside gc_cal.

AllGeneNum

Integer. Maximum genomic span (in ORF count) that the algorithm is allowed to cover when defining a single cluster. Passed unchanged from gc_cal.

MinConSeq

Integer. Minimum number of consecutive reference genes required to form a cluster. Passed unchanged from gc_cal.

Details

  • If the gap between two consecutive reference genes is larger than AllGeneNum - MinConSeq, the left-hand gene is recorded as the last member of the preceding cluster.

  • The final reference gene is always appended to the vector as the last boundary, ensuring the rightmost cluster is not lost.