Learn R Programming

rEMM (version 0.1-2)

score: Score a New Sequence Given an EMM

Description

Calculates a score of how likely it is that a new sequence was generated by the same process as the sequences used to build the EMM.

Usage

## S3 method for class 'EMM,matrix':
score(x, newdata, method = c("prod", "sum", "log_odds"), match_state = "nn", plus_one=TRUE, initial_transition = FALSE)

Arguments

x
an EMM object.
newdata
sequence to score
method
method to calculate the score (see details)
match_state
do the new observations have to fall within the threshold of state ("exact") or is nearest neighbor used ("nn")?
plus_one
add one to each transition count. This is equal to start with one for each transition count, i.e. initially all transitions are equally likely. It prevents the product of probabilities to be zero if a transition was never observe
initial_transition
include the initial transition in the computation?

Value

  • A scalar score value.

Details

A score of how likely it is that a sequence was generated by a given EMM model can be calculated by the length-normalized product or sum of probabilities on the path along the new sequence. The scores for a new sequence of length $l$ are defined as:

$$P_\mathrm{prod} = \sqrt[l-1]{\prod_{i=1}^{l-1}{a_{s(i),s(i+1)}}}$$

$$P_\mathrm{sum} = \frac{1}{l-1} \sum_{i=1}^{l-1}{a_{s(i),s(i+1)}}$$

where $a_{ij}$ is the transition probability from state $i$ to state $j$ and $s(i)$ is the state the $i$th data point in the new sequence is assigned to.

See Also

transition to access transition probabilities and find_states for assigning observations to states/clusters.

Examples

Run this code
data("EMMsim")

emm <- EMM(threshold=.5)
emm <- build(emm, EMMsim_train)

score(emm, EMMsim_test)

Run the code above in your browser using DataLab