next up previous
Next: Final move selection Up: Higher order statistics in Previous: Selection of patterns

Exploration and exploitation

One has to balance between exploring lots of truly different play outs and exploiting the known good moves and studying them more closely. Here it is done by using simulated annealing: In the beginning, the amount of noise is large, but towards the actual selection of the move, the amount of noise is decreased to zero linearly.

Another possibility would be to add a constant (say 1) to the numerator and the denominator of Equation 3 for emphasising exploration of unseen moves. This corresponds to an optimistic prior or pseudocount. When the time comes to select the actual move, one wants to de-emphasise unseen moves, which is done by adding the constant only to the denominator.



Tapani Raiko 2006-09-01