Learn R Programming

rEMM (version 1.0-5)

score: Score a New Sequence Given an EMM

Description

Calculates a score of how likely it is that a new sequence was generated by the same process as the sequences used to build the EMM.

Usage

## S3 method for class 'EMM,matrix':
score(x, newdata, method = c("product", "log_sum", "sum",
"weighted_product", "weighted_log_sum", "weighted_sum", "log_odds", "missing_transitions"), 
match_cluster = "nn", plus_one=FALSE, initial_transition = FALSE)

Arguments

x
an EMM object.
newdata
sequence to score
method
method to calculate the score (see details)
match_cluster
do the new observations have to fall within the threshold of the cluster ("exact") or is nearest neighbor used ("nn")?
plus_one
add one to each transition count. This is equal to start with a count of one for each transition, i.e. initially all transitions are equally likely. It prevents the product of probabilities to be zero if a transition was never o
initial_transition
include the initial transition in the computation?

Value

  • A scalar score value.

Details

A score of how likely it is that a sequence was generated by a given EMM model can be calculated by the length-normalized product or sum of probabilities on the path along the new sequence. The scores for a new sequence $x$ of length $l$ can be computed by the following methods:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

where $x_i$ represents the $i$-th data point in the new sequence, $a(i,j)$ is the transition probability from state $i$ to state $j$ in the model, $s(i)$ is the state the $i$-th data point ($x_i$) in the new sequence is assigned to, and $\mathrm{simil}(\cdot)$ is a similarity function (uses the same similarity/distance measure that was used to create the model; distances are converted into similarities using $\mathrm{simil} = 1/(1+\mathrm{d})$).

For missing transitions $\mathrm{I(v)}{I(v)}$ is an indicator function which is 1 for $v=0$ and 0 otherwise.

See Also

transition to access transition probabilities and find_clusters for assigning observations to states/clusters.

Examples

Run this code
data("EMMsim")

emm <- EMM(threshold=.5)
emm <- build(emm, EMMsim_train)

### compute various scores
score(emm, EMMsim_test, method="missing_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")

### shuffle the data and score again
EMMsim_test <- EMMsim_test[sample(1:nrow(EMMsim_test)),]
score(emm, EMMsim_test, method="missing_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")

### deal with missing transitions
score(emm, EMMsim_test, method="product", plus_one=TRUE)
score(emm, EMMsim_test, method="log_sum", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_product", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_log_sum", plus_one=TRUE)

Run the code above in your browser using DataLab