seqlogp: Computing the logarithm of sequences probabilities

Description

Compute the logarithm of probability of each sequence using a state transition model. The probability of a sequence is equal to the product of each state probability of the sequence. There are several method to compute a state probability.

Usage

seqlogp(seqdata, prob="trate", time.varying=TRUE, begin="freq", weighted=TRUE)

Arguments

seqdata

The sequence to compute the probabilities.

prob

The name of the probability model used. The probability can be either based on transition rates ("trate") or on state frequencies ("freq"). This can also be an array specifying the transition probabilities at

time.varying

Logical. If TRUE, the probabilities are (either transition or frequencies) are computed separately for each time $t$

begin

Model used to compute the probability of the first state. Either "freq" to use the observed frequencies on the first period or a vector specifying the probability of each states appearing in seqdata.

weighted

Logical. If TRUE, uses the weights specified in seqdata when computing the observed transition rates.

Value

A vector containing the logarithm of each sequence probability.

Details

The sequence likelihood $P(s)$ is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position. Let $s=s_{1}s_{2} \cdots s_{\ell}$ be a sequence of length $\ell$. Then $$P(s)=P(s_{1},1) \cdot P(s_{2},2) \cdots P(s_{\ell},\ell)$$ with $P(s_{t},t)$ the probability to observe state $s_t$ at position $t$.

The question is how to determinate the state probabilities $P(s_t,t)$. Several methods are available and can be set using the prob argument. One commonly used method for computing them is to postulate a Markov model, which can be of various order. We can consider probabilities derived from the first order Markov model, that is each $P(s_t,t)$, $t>1$ is set to the transition rate $p(s_t|s_{t-1})$. This is available in seqlogp by setting prob="trate". The transition rates may be considered constant over time/positions (time.varying=FALSE), that is estimated across sequences from the observations at positions $t$ and $t-1$ for all $t$ together. Time varying transition rates may also be considered (time.varying=TRUE), in which case they are computed separately for each position, that is estimated across sequences from the observations at positions $t$ and $t-1$ for each $t$, yielding an array of transition matrices. The user may also specify his own transition rates array or matrix. Another method is to use the frequency of a state at each position to set $P(s_{t},t)$ (prob="freq"). In the latter case, the probability of a sequence is independant of the probability of its transition. Here again, the frequencies can be computed all together (time.varying=FALSE) or separately for each position $t$ (time.varying=TRUE).For $t=1$, we set $P(s_1,1)$ to the observed frequency of the state $s_1$ at position 1. Alternatively, the begin argument allows to specify the probability of the first state.

The likelihood $P(s)$ being generally very small, seqlogp return $-\log P(s)$. The latter quantity is minimal when $P(s)$ is equal to $1$.

Examples

Run this code

## Creating the sequence objects using weigths
data(biofam)
biofam.seq <-  seqdef(biofam, 10:25, weights=biofam$wp00tbgs)

## Computing sequence probabilities
biofam.prob <- seqlogp(biofam.seq)
## Comparing the probability of each cohort
cohort <- biofam$birthyr>1940
boxplot(biofam.prob~cohort)

Run the code above in your browser using DataLab