next up previous
Next: Maximisation step Up: PARAMETER ESTIMATION Previous: PARAMETER ESTIMATION

Expectation step

The $ \alpha$ and $ \beta$ values computed by the forward-backward algorithm are useful for many purposes. Firstly, the probability of the observation sequence given the LOHMM is simply $ p_{seq}=\sum_{s
\in S_T}\alpha_T(s)=\beta_0(start)$. Secondly, the probability of being in a state $ s$ at time $ t$ given the observation sequence is $ \alpha_t(s)\beta_t(s)/p_{seq}$. Thirdly, computing the expected counts required for the maximization step, becomes simple.

The forward-backward algorithm is based on dynamic programming. To ease the computations, we find sets $ S_t$ in the beginning of learning. The set $ S_t$ is defined as the set of states at time $ t$ that are reachable from start and are not conflicting with the observation sequence $ o_1,o_2,\dots,o_T$. The reachable states can be found in two phases:

Collect
Set $ S_0=\{$start$ \}$. For each $ t=0,1,\dots,T$, using the states $ S_{t}$ as sources, find all states $ S_{t+1}$ where a transition can be made without conflict with the current observation $ o_t$.
Prune
For each $ t=T,T-1,\dots,0$, remove state from $ S_t$ if there are only conflicting transitions to any of the states still in $ S_{t+1}$.
The $ start$ state is pruned if and only if the LOHMM could not have produced the sequence.

Probabilities $ \alpha$ and $ \beta$ are computed for each state in each $ S_t$ recursively. The $ \alpha_t(s)$ is defined as the probability of the partial observation sequence $ o_1,\dots,o_{t-1}$ and state $ s$ at time $ t$ given the LOHMM. The $ \beta_t(s)$ is the probability of the partial observation sequence $ o_t,\dots,o_T$ given state $ s$ at time $ t$ and the LOHMM. Set $ \alpha_0(start)=1.0$ and $ \beta_{T+1}(s)=1.0$ for every $ s\in S_{T+1}$. Recursion formulae are

$\displaystyle \alpha_t(h)=\sum_{cl\in \mathcal{H}}\sum_{b\in S_{t-1}} \alpha_{t-1}(b)p_{cl}\mu\delta(cl,b,h,o_t)$     (6)
$\displaystyle \beta_t(b)=\sum_{cl\in \mathcal{H}}\sum_{h\in S_{t+1}} \beta_{t+1}(h)p_{cl}\mu\delta(cl,b,h,o_t),$     (7)

where $ p_{cl}$ is the transition probability and $ \mu$ is the selection probability of the instantiation of the new variables in the head and observation. The indicator function $ \delta(cl,b,h,o_t)=1$ whenever transition $ cl$ can take from state $ b$ to $ h$ observing $ o_t$ and the transition $ cl$ has the most specific body for $ b$.

The expected counts for the maximisation step are computed using $ \alpha$ and $ \beta$ values. Let $ \nu(cl,b,h,t)$ be the probability of going from state $ b$ at time $ t$ to state $ h$ at time $ t+1$ using the transition $ cl$ given the LOHMM and the observation sequence:

$\displaystyle \nu(cl,b,h,t)=\frac{\alpha_t(b)p_{cl}\mu\beta_{t+1}(h)\delta(cl,b,h,o_t)}{p_{seq}}.$     (8)

The expected count $ s(cl)$ of using transition $ cl$ with the data set $ \boldsymbol{X}$ is
$\displaystyle s(cl)=\sum_{seq\in \boldsymbol{X}}\sum_{t=0}^{T_{seq}}\sum_{b\in S_t}\sum_{h\in S_{t+1}}\nu(cl,b,h,t),$     (9)

where $ T_{seq}$ is the length of the sequence $ seq$. The expected counts of the selection distribution $ \mu$ are also obtained by summing terms $ \nu$ analogously.


next up previous
Next: Maximisation step Up: PARAMETER ESTIMATION Previous: PARAMETER ESTIMATION
Tapani Raiko 2003-07-09