next up previous
Next: Combining the evaluations by Up: Higher order statistics in Previous: Related Work

Higher order statistics in play-out analysis

Black and white player are playing a game with alternate moves. At some state $ s$ of the game, a play-out analysis means that $ k$ hypothetical continuations $ c$ of the game are played out to the end and statistics from these play outs are used to determine the next move. The play outs might be sampled by drawing moves from a uniform distribution over all legal moves, or by other means. To sample a semi-random move, one can add noise from a random number generator to a heuristic evaluation of each move and select the best one.

The expected-outcome heuristic $ h_e$ evaluates the move $ m_0$ as

$\displaystyle h_e(m_0)=\frac{\sum_{i=1}^k \chi(c_i \text{ is a win})\chi(m_0 \text{ first move of } c_i)} {\sum_{i=1}^k \chi(m_0 \text{ first move of } c_i)},$ (1)

where $ \chi$ is the indicator function ( $ \chi($true$ )=1$ and $ \chi($false$ )=0$). The best move is thus the one that lead to most wins in the play outs $ c$.

Note that in Hex and Y the order of moves does not matter. One can play the game from current state all the way to filling the board. The winner of the game can be determined from the filled board without knowing the order of moves. One can think that the single play out represents all the possible play outs that lead to the same position. This leads to the all-moves-as-first heuristic $ h_a$:

$\displaystyle h_a(m_0)=\frac{\sum_{i=1}^k \chi(c_i \text{ is a win})\chi(m_0 \text{ is made in } c_i)} {\sum_{i=1}^k \chi(m_0 \text{ is made in } c_i)}.$ (2)

The best move is the one that lead to wins when used at any point in the continuation. An example is shown in Figure 3. Often, play out is done using a semi-random move selection with the heuristic itself.

Figure 3: The expected-outcome, or equivalently in this case, the all-moves-as-first heuristic for the first move on a straight (top) and a bent (bottom) Y board. The colour shows the probability of a black win assuming random play after the first move. Note that the straight board gives a large emphasis on the center, which is the reason why the bent board is often used instead.
\includegraphics[width=0.45\textwidth]{randomanalysis.eps}

Higher order heuristic $ h_h$ evaluates the move $ m_0$ after moves $ m_1,m_2,\dots,m_l$ are made in any order.

$\displaystyle h_h(m_0\mid \{m_j\}_{j=1}^l) =$     (3)
$\displaystyle \frac{\sum_{i=1}^k \chi(c_i \text{ is a win})\chi(m_0 \text{ afte...
...} c_i)}
{\sum_{i=1}^k \chi(m_0 \text{ after }\{m_j\}_{j=1}^l \text{ in } c_i)}.$      

The sets of moves $ \{m_j\}_{j=1}^l$ are called patterns. Many different patterns apply in a future state when sampling a play out, and one must decide how to combine evaluations corresponding to different patterns. Also, one has to be selective which patterns are taken into consideration. Before going into these details, an example is shown in Figure 4.

Figure 4: The current game state in tic-tac-toe is shown at the top left corner. Six patterns are numbered and shown in a graph. When playing out, a state at the bottom right corner is encountered. Patterns 1, 2, and 4 apply to it.
\includegraphics[width=0.4\textwidth]{tictactoe.eps}



Subsections
next up previous
Next: Combining the evaluations by Up: Higher order statistics in Previous: Related Work
Tapani Raiko 2006-09-01