Garo Garabedyan's Divergent Thinking Blog

Chess BFS search with heuristic static evaluation functions. Parallel algorithm on OpenMP

with one comment

Garry Kasparov (World Chess Champion 1985-2000) vs. Deep Blue (IBM’s chess engine) in 1997

IBM’s DeepBlue analyzes 200,000,000 possible moves per second (brute force approach) with an array of 256 CPUs.

I would like to present you my long-term project-A Chess game engine using Breadth-First Search with static heuristic evaluation functions. Chess is a two-player zero-sum game. The worst the possible moves of the opponent are, the better is your move. I have used an open-source chess player Beowulf v. 2.4a and have move it to use OpenMP to become parallel. The Beowulf is the chess engine of ChessBrain project, a project winning a world record for computation in 2004, networking together 2,070 computers located in 56 countries to play a single game of chess. OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.

Due to their special structure, tree search algorithms can be naturally parallelized (data parallization: the same function running on a parallel environment processing different input, compared to functional parallization where the processing units take semantically different input), which makes them a very attractive area research in parallel computing. The main elements of tree search algorithms include the following [Yan Xu, Scalable Algorithms For Parallel Tree Search, 2007].

  •  Processing method: A method for computing path cost and testing whether the current node contains a goal state.
  •  Successor function: A method for creating a new set of nodes from a given node by expanding the state defined by that node.
  •  Search strategy: A method for determining which node should be processed next.
  •  Pruning rule: A method for determining when it is possible to discard nodes whose successors cannot produce solutions better than those found already or who cannot produce a solution at all.

Theoretically, the fundamentals of chess program have three major segments: move generator, evaluation function and search algorithm. Decision tree that is the base of search algorithm grows exponentially with factors depending of position, hash tables, number of pieces on the board… If we suppose that on the first level of the decision tree one has node with normal 40 exists, on the second level it will be 402 = 1600 nodes, on the third 403 it will be 64000 etc. It is obvious that number of nodes and processing time depends exponentially of the level – depth of the search tree. In theory, that effect is called combinational explosion. On the other hand, the quality of computer play strongly depends on depth of the decision tree so the effect of the exponential explosion limits the computer chess strength. There are generally two approaches to overcome this problem: Shannon-type-A [By the name of its inventor in 1949 Claude Elwood Shannon (1916 – 2001) (See] (full-width approach) and Shannon-type-B (selective search). The first one tries to solve the problem by using the simple pruning mechanisms based on Alpha-Beta technique with idea to decrease maximally the time needed to compute one single node. This approach benefits maximally of the rapidly increasing speed of the modern CPU-s and also incorporates standard (cluster) parallelization. The second approach (Type-B) is concentrate on heuristic pruning based on relatively extensive portion of knowledge directly programmed into the evaluation or move generator function. This approach is very depended on the programmer skill and the quality of the implemented pruning, so the results could be very relative. On today’s level of knowledge in this area, the best combination is to use near full-width approach in main searcher, and selective searcher in q-search (quiesce) procedure. The algorithms could be combined: Alpha-Beta, Null Move and PVS (NegaScout). [Vladan V. Vuckovic, The Method of The Chess Search Algorithms Parallelization Using Two-Processor Distributed System, Ser. Math. Inform. Vol. 22, No. 2 (2007), pp. 175-188]

Why Quiescence Search? The problem with abruptly stopping a search at a fixed depth is something called the ‘horizon effect’. It might be that you have just captured an opponent’s pawn at depth 0, then you return that score being justifiably proud. However, if you had searched another ply deeper you would have seen that the opponent could recapture your queen! [Colin Frayn, Computer Chess Programming Theory, 2005.]

Chess Breadth-first search can degenerate as not only positions with the same distance to the start are evaluated together. Variations of chess players’ moves involve subcomputations whose amount of work is not known in advance, and hence the work can only be distributed at runtime, the parallel algorithm is irregular. [Michael Sub and Claudia Leopold, Implementing Irregular Parallel Algorithms with OpenMP]

Deferred thread cancellation was implemented by hand as OpenMP does not provide an ability a thread to cancel another thread from the outside. [Michael Sub and Claudia LeopoldImplementing Irregular Parallel Algorithms with OpenMP] Deferred cancellation is possible in POSIX threads as well. [POSIX standard, pthread_setcancelstate] Below is shown how deferred thread cancellation was implemented:

// parallel cancel threads workaround
if (IsCancelThreads)
// parallel critical region: workaround a return clause
#pragma omp critical (ReturnResultFromThread)
    if (!IsResultFound) {
        ResultFromThread = CMSCORE - ply - 1;
        IsResultFound = TRUE;
    IsCancelThreads = TRUE;
} //includes implicit flush
if (IsCancelThreads)
IsCancelThreads = TRUE;
#pragma omp flush (IsCancelThreads)
if (IsCancelThreads)

Download the parallel open-source version of Beowulf (for Unix systems) from ParallelBeowulf.rar. (Change the file extension from ODT to RAR and unrar the archive. restricts file upload extensions.)

A set of 1285 Extended Position Description (EPD) Chess tests were ran toward the parallel and sequential Beowulf versions running on a same computer environment (CPU: AMD Opteron 64 Dual Core Processor 1.8 GHz RAM: 2GB 800 MHz HDD: 2x160GB Hitachi SATA in RAID 0 MB: Asus M2N-LR OS: Scientific Linux 5.3 64 bit Middleware: MPICH2 1.1.1p2). The compared results can be found here: sequential-output – parallel-output. After revealing the results it seems that the parallel version performs faster by examining many more moves (higher computational speed) but fails short in evaluating the best move. It took the parallel algorithm too much time to find the best move compared with the sequential Beowulf and in some of the test cases the time limit of half of a minute was not enough for the parallel Beowulf to find the correct move while the sequential algorithm succeeds.

EPD specifies the piece placement, the active color, the castling availability, and the en passant target square of a position. The castling availability field indicates potential future castling that may or may not be possible at the moment due to blocking pieces or enemy attacks. An EPD test contains a list (single or many items) of best moves which are compared with the algorithm’s best move in order to determine is the test passed or failed.

IDE Note: While Visual Studio 2008 and 2010 support OpenMP 2.0 for Visual C++, it is a shame that Visual Studio 2010 does not support OpenMP 3.0.


Written by garabedyan

April 29, 2011 at 20:51

One Response

Subscribe to comments with RSS.

  1. It’s pretty buggy, it’s pretty hard to parallel the search with OpenMP, the same goes to MPI


    July 16, 2013 at 07:09

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s