Calculates the exact p-value in the identically and independently distributed of a given local score, a sequence length that 'must not be too large' and for a given score distribution
daudin(
local_score,
sequence_length,
score_probabilities,
sequence_min = NULL,
sequence_max = NULL,
score_values = NULL
)
A double representing the probability of a local score as high as the one given as argument.
the observed local score
length of the sequence
the probabilities for each score from lowest to greatest (Optionnaly with scores as names)
minimum score (optional if score_values
OR names(score_probabilities) is defined)
maximum score (optional if score_values
OR names(score_probabilities) is defined)
vector of integer score values, associated to score_probabilities (optional if
sequence_min
and sequence_max
OR names(score_probabilities) are defined)
Either sequence_min
and sequence_max
are specified as input, OR all possible score values in
score_values
vector ; one of this choice is required. <cr>
Small in this context depends heavily on your machine.
On a 3,7GHZ machine this means for daudin(1000, 5000, c(0.2, 0.2, 0.2, 0.1, 0.2, 0.1), -2, 3)
an execution time of ~2 seconds. This is due to the calculation method using matrix exponentiation
which takes times. The size of the matrix of the exponentiation is equal to a+1 with a the
local score value. The matrix must be put at the power n, with n the sequence length.
Moreover, it is known that the local score value is expected to be in mean of order log(n).
p1 <- daudin(local_score = 4, sequence_length = 50,
score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1),
sequence_min = -3, sequence_max = 2)
p2 <- daudin(local_score = 4, sequence_length = 50,
score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1),
score_values = as.integer(-3:2))
p1 == p2 # TRUE
prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
score_values <- which(prob != 0) - 6 # keep only non null probability scores
prob0 <- prob[prob != 0] # and associated probability
p <- daudin(150, 10000, prob, sequence_min = -5, sequence_max = 5)
p0 <- daudin(150, 10000, prob0, score_values = score_values)
names(prob0) <- score_values
p1 <- daudin(150, 10000, prob0)
p == p0 # TRUE
p == p1 # TRUE
Run the code above in your browser using DataLab