score: Score a New Sequence Given an EMM

Description

Calculates a score of how likely it is that a new sequence was generated by the same process as the sequences used to build the EMM.

Usage

## S3 method for class 'EMM,matrix':
score(x, newdata, method = c("product", "log_sum", "sum",
"weighted_product", "weighted_log_sum", "weighted_sum", "log_odds", 
"supported_transitions", "supported_states", "weighted_supported_states", 
"sum_transitions"), 
match_cluster = "nn", plus_one=FALSE, initial_transition = FALSE)
## S3 method for class 'EMM,EMM':
score(x, newdata, method = c("product", "log_sum", "sum", 
"supported_transitions"), match_cluster = "nn", plus_one=FALSE, 
initial_transition = FALSE)

Arguments

an EMM object.

newdata

sequenc or another EMM object to score.

method

method to calculate the score (see details)

match_cluster

do the new observations have to fall within the threshold of the cluster ("exact") or is nearest neighbor used ("nn")? If match_cluster is a number n then observations need to fall within n times the cl

plus_one

add one to each transition count. This is equal to start with a count of one for each transition, i.e. initially all transitions are equally likely. It prevents the product of probabilities to be zero if a transition was never o

initial_transition

include the initial transition in the computation?

Value

A scalar score value.

Details

The scores for a new sequence $x$ of length $l$ can be computed by the following methods:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

where $x_i$ represents the $i$-th data point in the new sequence, $a(i,j)$ is the transition probability from state $i$ to state $j$ in the model, $s(i)$ is the state the $i$-th data point ($x_i$) in the new sequence is assigned to, and $\mathrm{simil}(\cdot)$ is a modified and normalized similarity function. It is normalzed for the threshold such that it is $\mathrm{simil} = 1$ for $d \le \mathrm{threshold}$ and $\mathrm{simil} = .5^{(d/threshold -1)}$ otherwise. $d$ is the same distance measure that was used to create the model.

For missing transitions $\mathrm{I(v)}$ is an indicator function which is 0 for $v=0$ and 1 otherwise.

Examples

Run this code

data("EMMsim")

emm <- EMM(threshold=.5)
emm <- build(emm, EMMsim_train)

### compute various scores
score(emm, EMMsim_test, method="supported_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")
score(emm, EMMsim_test, method="supported_states", match="exact")
score(emm, EMMsim_test, method="weighted_supported_states")

### shuffle the data and score again
EMMsim_test <- EMMsim_test[sample(1:nrow(EMMsim_test)),]
score(emm, EMMsim_test, method="supported_transitions")
score(emm, EMMsim_test) # default is "product"
score(emm, EMMsim_test, method="sum")
score(emm, EMMsim_test, method="log_sum")
score(emm, EMMsim_test, method="weighted_product")
score(emm, EMMsim_test, method="weighted_sum")
score(emm, EMMsim_test, method="weighted_log_sum")
score(emm, EMMsim_test, method="supported_states", match="exact")
score(emm, EMMsim_test, method="weighted_supported_states")

### deal with missing transitions
score(emm, EMMsim_test, method="product", plus_one=TRUE)
score(emm, EMMsim_test, method="log_sum", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_product", plus_one=TRUE)
score(emm, EMMsim_test, method="weighted_log_sum", plus_one=TRUE)