next up previous
Next: Exploration and exploitation Up: Higher order statistics in Previous: Combining the evaluations by

Selection of patterns

An algorithm based on inductive logic programming is used to select patterns. The process starts with just the empty pattern (for which $ l=0$). Repeatedly, after a certain number of play outs, new patterns are added. An existing pattern generates candidates where each possible $ m_0$ is added to the pattern, one at a time. The candidate is accepted if it has appeared often enough, and if $ m_0$ is a relevant addition to the pattern. If the original pattern makes move $ m_0$ more valuable than without the pattern, the moves seem to be related. The used criterion was based the mutual information between the made move and winning, given the pattern. If the move makes winning less likely, the criterion was set to zero. Then the maximum of mutual informations of the move and winning, given any of the subpatterns, is substracted from the score. When new patterns are added, statistics are copied from its parent with a small weight. Also, the statistics of the first play-outs are slowly forgotten by exponential decay.



Tapani Raiko 2006-09-01