find.points: Simple method to detect compositional changes in genomic sequences

Description

find is used to detect changes at genomic sequences composition. The method is based on fitting nonparametric models by using local linear kernel smoothers.

Usage

find.points(x,kbin= 300, p= 3, bandwidth=-1, weights= 1, nboot=100, kernel="gaussian",
n.bandwidths= 20, seed = NULL, ...)

Value

The function computes and returns a list of short information for a fitted change.points object.

Number of A-T base pairs: The returned value is the total nucleotide (adenine and thymine) contained at the sequence analyzed.
Number of C-G base pairs: In this case, the returned value is the sum of cytosine and guanine contained at the sequence.
Number of binning nodes: The number of binning nodes over which the function is to be estimated.
Number of bootstrap repeats: Number of bootstrap repeats.
Bandwidth: Value of the kernel bandwidth or smoothing parameter used in the fitting for A vs. T and C vs. G.
Exists any critical point: Emphasize if there is or not any critical.

Arguments

x: Sequences in binary system (by using change.binary function previously) are to be analyzed from.
kbin: The number of binning nodes over which the function is to be estimated.
p: Degree of the polynomial. By default p=3.
bandwidth: The kernel bandwidth or smoothing parameter. Large values of bandwidth make smoother estimates, smaller values of bandwidth make less smooth estimates. The default h=-1 is a bandwidth compute by cross validation.
weights: Weights.
nboot: Number of bootstrap repeats.
kernel: Character which denotes the kernel function (a symmetric density). By default kernel = "gaussian", this is, the Gaussian density function. Also, other types of kernel functions can be used: Epanechnikov and triangle, kernel="Epanech" and kernel="triang", respectively.
n.bandwidths: Number that it will be used to calculate the grid of bandwidths in a range between 0 and 1. In this grid, it will be selected the optimum bandwidth by cross-validation.If the optimum bandwidth value is close to 0, we will obtain rough estimates; when it is close to 1, we will obtain smooth estimates.
seed: Seed to be used in the bootstrap procedure.
...: Other options.

Author

Nora M. Villanueva and Marta Sestelo.

Details

For each genomic sequence the AT and CG skews profiles were calculated as \(A vs. T = (A-T)/(A+T)\) and \(C vs. G = (C-G)/(C+G)\).

References

N. M. Villanueva, M. Sestelo, M. M. Fonseca and J. Roca-Pardinas (2023). seq2R: An R package to detect change points in DNA sequences. Mathematics, 11 (10), 2299.

Examples

Run this code

library(seq2R)


#mtDNAhum <- read.genbank("NC_012920")
data(mtDNAhum)
DNA <- transform(mtDNAhum)
seq1<-find.points(DNA)
seq1

Run the code above in your browser using DataLab