next up previous
Next: Re-learning parameters Up: EXPERIMENTS Previous: EXPERIMENTS

Comparison of Estimators

Let us consider the following simple HMM written as a LOHMM:

$\displaystyle c(1) \stackrel{0.5}{\longleftarrow} start.$ $\displaystyle c(1) \stackrel{0.9:h}{\longleftarrow} c(1).$ $\displaystyle c(2) \stackrel{0.1:h}{\longleftarrow} c(2).$  
$\displaystyle c(2) \stackrel{0.5}{\longleftarrow} start.$ $\displaystyle c(1) \stackrel{0.1:t}{\longleftarrow} c(1).$ $\displaystyle c(2) \stackrel{0.9:t}{\longleftarrow} c(2).$  

This could be interpreted as someone picking either coin 1 or coin 2 with equal probability and then using that coin to generate a sequence of heads and tails. Coin 1 produces more heads and coin 2 more tails.

Given sequences $ h,h,h$ and $ t,t,t$ as data, the structure shown above and a uniform prior over parameters, we now ask, what would the different estimators give as parameters. In fact, all of them would give 0.5 for the selection between the coins, but the probabilities $ p_1$ and $ p_2$ for the coins $ 1$ and $ 2$ to produce heads are of more interest.

There are three fixed points for the maximum a posteriori (or maximum likelihood) estimator (Eq. 3). The first one is a saddle point at $ p_1=0.5$, $ p_2=0.5$ and the two others are the global maxima $ p_1=1.0$, $ p_2=0.0$ and $ p_1=0.0$, $ p_2=1.0$. Using random initialisation, the Baum-Welch algorithm would end up in the latter two with equal probabilities and to the first one with probability 0. This estimator would conclude from the data that one of the coins produces heads every time and the other only tails. If the estimated model is tested with a sequence $ h,t,h$, it would give a likelihood of exactly 0. From this failure one could conclude, that maximum likelihood estimator does not prepare well for new data if there is a limited amount of data available for learning.

The Bayes estimator (Eq. 4) is hard to evaluate in general, but in this case one can use symmetry to conclude that it is $ p_1=0.5$, $ p_2=0.5$. Since the estimator is always unique, it cannot decide which coin produces more heads (resp. tails).

The componentwise Bayes estimator (Eq. 5) has also three fixed points. The first one is the saddle point at $ p_1=0.5$, $ p_2=0.5$ in analogy with the map-estimator. The stable points are now at $ p_1=0.789$, $ p_2=0.211$ and $ p_1=0.211$, $ p_2=0.789$. Again, random initialisation will decide which one is chosen. The cB estimator seems to combine the good properties of the other two. It can operate with limited amount of data like the Bayes estimator, but can avoid the symmetrical solution.


next up previous
Next: Re-learning parameters Up: EXPERIMENTS Previous: EXPERIMENTS
Tapani Raiko 2003-07-09