CE: Break-point Detection via the CE Method

Description

This function carries out calculations to estimate both the number of break-points as well as their corresponding locations based on the Cross-Entropy (CE) method for sequences of continuous measurements, particulary for genomic sequences (array CGH data).

Usage

CE(data, N_max = 10, eps = 0.01, rho = 0.05, M = 200, h = 5,a=0.8, 
parallel = FALSE)

Arguments

data

Data to be analysed. A single column array or a dataframe.

N_max

Maximum number of break-points. Default vlaue is 10.

eps

The cut-off value for the Median Absolute Deviation value. Default value is 0.01.

rho

The fraction which is used to obtain the best performing set of sample solutions (elite sample). Default value is 0.05

Sample size to be used in simulating the locations of break-points from four-parameter beta distribution. Default value is 200.

Minimum abberation width. Deafult is 5.

Smoothing parameter value. Deafult is 0.8.

parallel

A logical argument specifying if parallel computation should be carried-out (TRUE) or not (FALSE). By default it is set as `FALSE'. In Windows OS systems "snow" functionalities are used, whereas in Unix/Linux/MAC OSX "multicore" functionalities are used t

Value

A list is returned containing the following items
No.BPsThe number of break-points in the data that is estimated by the CE method
BP.LocA vector of break-point locations

Details

The CE algorithm is a model based stochastic optimization method. In the breakpoint package it is used as an exact search method. A performance function score (modified BIC, Zhang & Seigmund (2007)) is calculated for each of the solutions generated by the four-parameter beta distribution from no change-point to the user provided maximum number of break-points. The solution that maximizes the modified BIC with respect to the number of break-points is considered as the optimal solution. Finally a vector of break-point locations are given along with the mean profile plot.

A list that contains the break-points and their corresponding locations are given in the console. The mean profile plot is also produced as an output. Furthermore, it stores information on computational time and the mean profile plot under the "CE" folder which is created in the current working directory.

References

Priyadarshana, W.J.R.M. and Sofronov, G. (2014) Multiple Break-Points Detection in array CGH Data via the Cross-Entropy Method. (Submitted)

Priyadarshana,W.J.R.M., and Sofronov, G. (2012) A Modified Cross Entropy Method for Detecting Multiple Change Points in DNA Count Data. In Proc.IEEE World Congress on Computational Intelligence (CEC2012), 1020-1027.

Rubinstein, R., and Kroese, D. (2004) The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer-Verlag, New York.

Zhang,N.R., and Seigmund, D.O. (2007) A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics, 63, 22-32.

Examples

Run this code

data(data)
CE(data)

Run the code above in your browser using DataLab