Usage
graphscan_1d(data, format = "fasta", events_series = "all",
id = NULL, n_simulation = 199, cluster_analysis = "both",
normalisation_factor = NULL, alpha = 0.05,
cluster_user_choice = "positive")
Arguments
data
main argument to define the format of the data. 'data' can be a vector of character string corresponding to files names of aligned DNA sequences. In this case, the format can be precised with argument 'format'. 'data' can be also a list of class 'DNAbin' produced by 'read.dna' function of 'ape' package. In all cases, DNA sequences must be aligned.
Finally 'data' can be a 'list' of numeric vector containing the positions of events. This list is called a series of events. These events are not be necessarily on the same segment and not also necessarily on a [0,1] segment. The argument 'normalisation_factor' allows to fix the upper and lower bounds of each events series.
format
a character string corresponding to the format of the DNA
sequences contained in files of argument 'data'. This is used by 'read.dna' function of 'ape' package. Possibles values are "interleaved","sequential","clustal" or "fasta" (default).
events_series
used if 'data' is a set of files names of aligned DNA sequences or list of class 'DNAbin'. 'events_series' can be a list of the form 'list(A,B)' where 'A' and 'B' corresponding to 2 vectors of sequences identifiants. The crossing AxB product is made to obtain a list of the series of events corresponding to the comparison between each sequence from 'A' to each sequence from 'B'. 'events_series' can be also a character string containing "all", in this case all possible comparison between sequences is made.
id
a character string corresponding to the prefix used to create the names of the events series.
n_simulation
number of simulations (default value 199) used to compute the p-values of clusters in a Monte-Carlo process. The value of 'n_simulation' is stored in the slot 'param' and can be modified by the function 'cluster'.
cluster_analysis
a character string corresponding to "positive", "negative" or "both" (default value) to detect respectively only the positives clusters, only the negatives clusters or both positives and negatives clusters. The value of 'cluster_analysis' is stored in the slot 'param' and can be modified by the function 'cluster'.
normalisation_factor
a list of vectors with a size equal to the number of events series. Each vector contains 2 integers: the minimum and the maximum for the events positions of series of events.
The maximum is the length of the DNA sequences if 'data' argument is a vector of character or an object of class "DNAbin". In these cases, the 'normalisation_factor' is automatically computed by the function 'graphscan_1d'.
If 'data' is a 'list' of numeric vector containing the positions of events the 'normalisation_factor' must be specified as a 'list' containing the upper and lower bounds of each events series. The values of 'normalisation_factor' are stored in the slot 'param'.
alpha
the threshold of significance (p-value) for keeping the candidate clusters. The value of 'alpha' is stored in the slot 'param'.
cluster_user_choice
use if 'cluster_analysis="both"'. 'cluster_user_choice' is a string character corresponding to "positive" (default value), "negative" or "random". If two candidates clusters one positive and one negative have the same p-value this argument indicates how to choose between these 2 clusters. The value of 'cluster_user_choice' is stored in the slot 'param'.