seqlogp: Logarithm of the probabilities of state sequences

Description

Compute the logarithm of the probability of each state sequence obtained from a state transition model. The probability of a sequence is equal to the product of each state probability of the sequence. There are several methods to compute a state probability.

Usage

seqlogp(seqdata, prob="trate", time.varying=TRUE,
        begin="freq", weighted=TRUE)

Arguments

seqdata

The sequence to compute the probabilities.

prob

either the name ("trate" or "freq"$ of the probability model to use to compute the state probabilities, or an array specifying the transition probabilities at each position $t$ (see details).

time.varying

Logical. If TRUE, the probabilities (transitions or frequencies) are computed separately for each time $t$ point.

begin

Model used to compute the probability of the first state. Either "freq" to use the observed frequencies on the first period or a vector specifying the probability of each state of the alphabet.

weighted

Logical. If TRUE, uses the weights specified in seqdata when computing the observed transition rates.

Value

A vector containing the logarithm of each sequence probability.

Details

The sequence likelihood $P(s)$ is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position. Let $s=s_{1}s_{2} \cdots s_{\ell}$ be a sequence of length $\ell$. Then $$ P(s)=P(s_{1},1) \cdot P(s_{2},2) \cdots P(s_{\ell},\ell) $$ with $P(s_{t},t)$ the probability to observe state $s_t$ at position $t$.

The question is how to determinate the state probabilities $P(s_t,t)$. Several methods are available and can be set using the prob argument.

One commonly used method for computing them is to postulate a Markov model, which can be of various order. We can consider probabilities derived from the first order Markov model, that is, each $P(s_t,t)$, $t>1$ is set as the transition rate $p(s_t|s_{t-1})$. This is available in seqlogp by setting prob="trate". The transition rates may be considered constant over time/positions (time.varying=FALSE), that is estimated across sequences from the observations at positions $t$ and $t-1$ for all $t$ together. Time varying transition rates may also be considered (time.varying=TRUE), in which case they are computed separately for each position, that is estimated across sequences from the observations at positions $t$ and $t-1$ for each $t$, yielding an array of transition matrices. The user may also specify his own transition rates array or matrix.

Another method is to use the frequency of a state at each position to set $P(s_{t},t)$ (prob="freq"). In the latter case, the probability of a sequence is independent of the probability of the transitions. Here again, the frequencies can be computed all together (time.varying=FALSE) or separately for each position $t$ (time.varying=TRUE). For $t=1$, we set $P(s_1,1)$ to the observed frequency of the state $s_1$ at position 1. Alternatively, the begin argument allows to specify the probability of the first state.

The likelihood $P(s)$ being generally very small, seqlogp return $-\log P(s)$. The latter quantity is minimal when $P(s)$ is equal to $1$.

Examples

Run this code

# NOT RUN {
## Creating the sequence objects using weigths
data(biofam)
biofam.seq <-  seqdef(biofam, 10:25, weights=biofam$wp00tbgs)

## Computing sequence probabilities
biofam.prob <- seqlogp(biofam.seq)
## Comparing the probability of each cohort
cohort <- biofam$birthyr>1940
boxplot(biofam.prob~cohort)

# }

Run the code above in your browser using DataLab