automatic_analysis: Automatic analysis

Description

Calculates local score and p-value for sequence(s) with integer scores.

Usage

automatic_analysis(
  sequences,
  model,
  scores,
  transition_matrix,
  distribution,
  method_limit = 2000,
  score_extremes,
  modelFunc,
  simulated_sequence_length = 1000,
  ...
)

Arguments

sequences

sequences to be analysed (named list)

model

the underlying model of the sequence (either "iid" for identically independently distributed variable or "markov" for Markov chains)

scores

vector of minimum and maximum score range

transition_matrix

if the sequences are markov chains, this is their transition matrix

distribution

vector of probabilities in ascending score order (iid sequences). Note that names of the vector must be the associated scores.

method_limit

limit length from which on computation-intensive exact calculation methods for p-value are replaced by approximative methods

score_extremes

a vector with two elements: minimal score value, maximal score value

modelFunc

function to create similar sequences. In this case, Monte Carlo is used to calculate p-value

simulated_sequence_length

if a modelFunc is provided and the sequence happens to be longer than method_limit, the method karlinMonteCarlo is used. This method requires the length of the sequences that will be created by the modelFunc for estimation of Gumble parameters.

...

parameters for modelFunc

Value

A list object containing

Local score

local score...

p-value

p-value ...

Method

the method used for the calculus of the p-value

Details

This method picks the adequate p-value method for your input. If no sequences are passed to this function, it will let you pick a FASTA file. If this is the case, and if you haven't provided any score system (as you can do by passing a named list with the appropriate scores for each character), the second file dialog which will pop up is for choosing a file containing the score (and if you provide an extra column for the probabilities, they will be used, too - see section File Formats in the vignette for details). The function then either uses empirical distribution based on your input - or if you provided a distribution, then yours - to calculate the p-value based on the length of each of the sequences given as input. You can influence the choice of the method by providing the modelFunc argument. In this case, the function uses exclusively simulation methods (monte_carlo, monte_carlo_karlin). By setting the method_limit you can further decide to which extent computation-intensive methods (daudin, exact_mcc) should be used to calculate the p-value. Remark that the warnings of the localScoreC() function have be deleted when called by automatic_analysis() function

Examples

Run this code

# NOT RUN {
# Minimal example
l = list()
seq1 = sample(-2:1, size = 3000, replace = TRUE)
seq2 = sample(-3:1, size = 150, replace = TRUE)
l[["hello"]] = seq1
l[["world"]] = seq2
automatic_analysis(l, "iid")
# Example with a given distribution 
automatic_analysis(l,"iid",scores=c(-3,1),distribution=c(0.3,0.3,0.1,0.1,0.2))
# forcing the exact method for the longest sequence
aa1=automatic_analysis(l,"iid")
aa1$hello$`method applied`
aa1$hello$`p-value`
aa2=automatic_analysis(l,"iid",method_limit=3000)
aa2$hello$`method applied`
aa2$hello$`p-value`
# Markovian example 
MyTransMat <-
matrix(c(0.3,0.1,0.1,0.1,0.4, 0.3,0.2,0.2,0.2,0.1, 0.3,0.4,0.1,0.1,0.1, 0.3,0.3,0.3,0.0,0.1, 
        0.1,0.3,0.2,0.3,0.1), ncol = 5, byrow=TRUE)
MySeq.CM=transmatrix2sequence(matrix = MyTransMat,length=150, score =-2:2)
MySeq.CM2=transmatrix2sequence(matrix = MyTransMat,length=110, score =-2:2)
automatic_analysis(sequences = list("x1" = MySeq.CM, "x2" = MySeq.CM2), model = "markov")
# }

Run the code above in your browser using DataLab