score

0th

Percentile

Score a New Sequence Given an EMM

Calculates a score of how likely it is that a new sequence was generated by the same process as the sequences used to build the EMM.

Keywords
models
Usage
## S3 method for class 'EMM,matrix':
score(x, newdata, method = c("product", "log_sum", "sum",
        "log_odds", "supported_transitions", "supported_states", 
        "sum_transitions",  "log_loss", "likelihood", "log_likelihood", "AIC"), 
        match_cluster = "exact", prior=TRUE, normalize=TRUE, 
        initial_transition = FALSE, threshold = NA)
## S3 method for class 'EMM,EMM':
score(x, newdata, method = c("product", "log_sum", "sum", 
        "supported_transitions"), match_cluster = "exact", prior=TRUE, 
        initial_transition = FALSE)
Arguments
x
an EMM object.
newdata
sequence or another EMM object to score.
method
method to calculate the score (see details)
match_cluster
do the new observations have to fall within the threshold of the cluster ("exact") or is nearest neighbor ("nn") or weighted nearest neighbor (weighted) used?
prior
add one to each transition count. This is equal to start with a count of one for each transition, i.e. initially all transitions are equally likely. It prevents the product of probabilities to be z
normalize
normalize the score by the length of the sequence.
initial_transition
include the initial transition in the computation?
threshold
minimum count threshold used by supported transitions and supported states.
Details

The scores for a new sequence $x$ of length $l$ can be computed by the following methods. For match_cluster="exact" or "nn": [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] where $x_i$ represents the $i$-th data point in the new sequence, $a(i,j)$ is the transition probability from state $i$ to state $j$ in the model, $s(i)$ is the state the $i$-th data point ($x_i$) in the new sequence is assigned to. $\mathrm{I(v)}$ is an indicator function which is 0 for $v=0$ and 1 otherwise. For match_cluster="weighted": [object Object],[object Object],[object Object],[object Object]

where $\mathrm{simil}(\cdot)$ is a modified and normalized similarity function given by $\mathrm{simil}(x,s) = 1- \frac{1}{1+e^{-\frac{\mathrm{d}(x, s)/t -1.5}{.2}}}$ where $d$ is the distance measure and $t$ is the threshold that was used to create the model.

Value

  • A scalar score value.

See Also

transition to access transition probabilities and find_clusters for assigning observations to states/clusters.

Aliases
  • score
  • score,EMM,numeric-method
  • score,EMM,data.frame-method
  • score,EMM,matrix-method
  • score,EMM,EMM-method
Examples
data("EMMsim")
  
emm <- EMM(threshold=.2)
emm <- build(emm, EMMsim_train)
  
score(emm, EMMsim_test) # default is method "product"
  
  
### create shuffled data (destroy temporal relationship)
### and create noisy data
test_shuffled <- EMMsim_test[sample(1:nrow(EMMsim_test)),]
test_noise <- jitter(EMMsim_test, amount=.3)
  
### helper for plotting
mybars <- function(...) {
  oldpar <- par(mar=c(5,10,4,2))
  ss <- rbind(...) 
  barplot(ss[,ncol(ss):1], xlim=c(-1,4), beside=TRUE, 
          horiz=TRUE, las=2, 
          legend = rownames(ss))
  par(oldpar)
}
  

### compare various scores
methods <- c("product", 
             "sum", 
             "log_sum", 
             "supported_states", 
             "supported_transitions",
             "sum_transitions",
             "log_loss",
             "likelihood")

### default is exact matching
clean <- sapply(methods, FUN=function(m) score(emm, EMMsim_test, method=m))
shuffled <- sapply(methods, FUN=function(m) score(emm, test_shuffled, method=m))
noise <- sapply(methods, FUN=function(m) score(emm, test_noise, method=m))
mybars(shuffled, noise, clean)
  
### weighted matching is better for noisy data
clean <- sapply(methods, FUN=function(m) score(emm, EMMsim_test, method=m, 
                                               match="weighted"))
shuffled <- sapply(methods, FUN=function(m) score(emm, test_shuffled, method=m, 
                                                  match="weighted"))
noise <- sapply(methods, FUN=function(m) score(emm, test_noise, method=m, 
                                               match="weighted"))
mybars(shuffled, noise, clean)
Documentation reproduced from package rEMM, version 1.0-11, License: GPL-2

Community examples

Looks like there are no examples yet.