score: Score a New Sequence Given an EMM

Description

Calculates a score of how likely it is that a new sequence was generated by the same process as the sequences used to build the EMM.

Usage

## S3 method for class 'EMM,matrix':
score(x, newdata, method = c("product", "log_sum", "sum",
"weighted_product", "weighted_log_sum", "weighted_sum", "log_odds", "missing_transitions"), 
match_cluster = "nn", plus_one=FALSE, initial_transition = FALSE)

Arguments

an EMM object.

newdata

sequence to score

method

method to calculate the score (see details)

match_cluster

do the new observations have to fall within the threshold of the cluster ("exact") or is nearest neighbor used ("nn")?

plus_one

add one to each transition count. This is equal to start with a count of one for each transition, i.e. initially all transitions are equally likely. It prevents the product of probabilities to be zero if a transition was never o

initial_transition

include the initial transition in the computation?

Value

A scalar score value.

Details

A score of how likely it is that a sequence was generated by a given EMM model can be calculated by the length-normalized product or sum of probabilities on the path along the new sequence. The scores for a new sequence $x$ of length $l$ can be computed by the following methods:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

where $x_i$ represents the $i$-th data point in the new sequence, $a(i,j)$ is the transition probability from state $i$ to state $j$ in the model, $s(i)$ is the state the $i$-th data point ($x_i$) in the new sequence is assigned to, and $\mathrm{simil}(\cdot)$ is a similarity function (uses the same similarity/distance measure that was used to create the model; distances are converted into similarities using $\mathrm{simil} = 1/(1+\mathrm{d})$).

For missing transitions $\mathrm{I(v)}{I(v)}$ is an indicator function which is 1 for $v=0$ and 0 otherwise.

Examples

Run this code

data("EMMsim")

emm <- EMM(threshold=.5)
emm <- build(emm, EMMsim_train)

### compute various scores
score(emm, EMMsim_test, method="missing_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")

### shuffle the data and score again
EMMsim_test <- EMMsim_test[sample(1:nrow(EMMsim_test)),]
score(emm, EMMsim_test, method="missing_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")

### deal with missing transitions
score(emm, EMMsim_test, method="product", plus_one=TRUE)
score(emm, EMMsim_test, method="log_sum", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_product", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_log_sum", plus_one=TRUE)