Development of a Machine-Learning-Based AI for Go
Computer Systems Research 2005-2006
Justin Park
Abstract
Since Deep Blue’s victory in 1997 over Gary Kasparov, the World Chess Champion of the time, the new forefront of artificial intelligence has been the ancient game of Go, developed in China 2500 to 4000 years ago.
The challenges in Go lie in its large board set (19x19) and the complexity of developing a heuristic function. In Go, the influence each piece has on other pieces is very abstract, and often the outcome of a certain move can be seen only after many plays. The result is that Go programs are often very complex, with hard-coded patterns and responses to certain standard moves. An artificial intelligence that uses machine learning to develop the skills would simplify the programming of strategies. Larger boards can then use the database of situations in smaller boards to compute pattern recognition problems generally very difficult for an AI to perform. This project attempts to recreate the “Roving Eye” technique for Go while learning the game at smaller board sizes.
Introduction
The purpose of this project is to develop a Go algorithm for a 9 by 9 sized board by implementing the technique of machine learning. The algorithm would eventually utilize the Go Modem Protocol (GMP) to communicate with a standard graphical user interface for Go, thus enabling matches between other developed AI’s. Each game, in theory, would increase the performance of the algorithm by building a large database of situations and rating each situation by its outcome in the game.
In order to develop any AI, a set of rules for the game is required. The rules for Go, though simple in concept, is harder to program than traditional board games such as chess or checkers. The reason for this is because of the need to remove a body of stones based on whether or not it is completely surrounded, rather than single pieces. An even harder programming task is at the end of the game, when the amount of area each side has secured is counted.
After board rules are developed, the main algorithms for the artificial intelligence portion of the program need to be coded. This would include a guiding heuristic function that would keep the AI from making arbitrary moves from the start. The function would be able to evaluate board positions based on a resonating influence sphere from each stone. The second part of the AI is coding the actual machine-learning portion. This would include a database that stores each successfully completed game. Another function would search this database for similar board states and create an accompanying evaluation score, and then judge whether to use a move from the database or to use the strongest move found by the guiding heuristic function.
` The study will put emphasis on gains made by the AI through machine-learning when placed against its parent heuristic function. The analysis will include effects on direct human intervention in computer vs. computer games, human vs. computer games, and indirect intervention through the addition of human vs. human games in the archives.
Background Information
Go is an ancient board game comprised of a 19 by 19 grid and white and black stones. Developed between 2000 B.C. and 200 B.C. in China, it is the oldest surviving board game known to man.
Rules of Play:
The object of the game is to capture the largest amount of territory. Territory is defined by a surrounded region in the board that consists of two or more houses (eyes). This will be described later.
Each player alternates, with Black starting first. Any intersection on the grid can be played on with the exception of two rules:
-
A move cannot be suicidal unless it captures stones in the progress
-
In the event of Ko, a move must be placed elsewhere before moving in a position to regain a previous board state.
These rules will be described in depth later.
When one stone color surrounds the liberties of another player, the surrounded stones become “prisoners” and are removed from the board. Each captured stone deducts 1 space from the total area of its owner in the end of the game. A suicidal move is one that is placed in a area completely surrounded by the other player, with all liberties covered. Suicidal moves are illegal.
Ko:
If by capturing a stone, the board state is the same as it was two moves previously, it is an illegal move (otherwise, the repetition of the same moves can occur indefinitely). Instead, the player must play at a different location before capturing the stone. The concept of fighting over a stone is called a Ko battle.
If both players pass consecutively, the game ends. The players count the area gained and the winner is the one with the greatest amount. In typical 19x19 games, white is given a 5.5 or 6.5 handicap called Komi, which forces a winner in every game.
Machine Learning:
Machine Learning is a broad subtopic of Artificial Intelligence dealing with a computer’s ability to improve techniques by analyzing data and building upon previous knowledge. In the context of my project, machine learning will be able to examine a database of games given a particular board state, and return the best move after examining board similarity and board state strengths.
Perhaps the most interesting aspect of Machine Learning in my project is the affect of human intervention. When human games are placed in the database, it will add different paths that computers may not have thought of with just the usage of the parent heuristic function. Human games add not only the factor of creativity, but also can input known formations that have been in human knowledge for hundreds of years.
Lastly, a goal of this project is to show that database building on smaller size games affect initial game-play on larger board sizes. In particular, 7x7 games will be looked at before 5x5 game experience, and after.
Python:
Python was the language of choice for this project due to its high-level object oriented structure, which allowed for easy creation of classes (essential for database building). Because of its use of dynamic naming, it was also very easy to code the program without the need of declaring functions and variables in headers. Also, it has ease of integration with C and C++. Using wrapper code programs such as SWIG, C and C++ code works flawlessly with python. This was a key deciding factor as I originally had planned to integrate preexisting code written in C for the utilization of the GMP protocol and the reading of the .sgf format commonly used to save Go games.
Research Theory and Design Criteria:
In this section I will go in depth through the various algorithms and program structure.
The program is divided into three classes. The first class, which contains the main body of the code is Board. Within it is the main loop of the code, the methods for determining board rules, and the main code for the AI. The second and third classes deal with the data-mining/machine-learning aspect of the AI.
The main loop of the program is in AskInput(). It takes in input through text based commands and then sends them to a parser. Some of the commands I included were move, pass, resign, show (which prints all the boards), editor (to set up the board to whichever state), and fast, which lets two AI players play each other with no pause.
The next important function that the program needed to accomplish was to carry out all the rules of Go. The majority of this task dealt with finding out which moves were illegal. By far, the hardest illegal move to detect was suicidal moves. I used nsurround(), a recursive function, to check surrounding stones for a gap in a contiguous body. nsurround() not only accomplishes checking for illegal moves but also helps in killing bodies of stones. In order to do this, I use the inverse function of nsurround(), checking when a move is placed, if it completes the capture of neighboring opposite-colored stones.
After this, I needed to create a basic guiding heuristic function. This was done with influenceHeuristic(), which evaluates positions based on their influence from radiating stones (the influence decreases with distance). A “plasticBoard,” a fake copy of current board is needed in order to test out the evaluation function at all the possible positions without affecting the original board. The evaluation is done by powerpoints(), which takes distance into account when assigning point values. Contiguous stones are awarded bonus points, and empty spaces are given points depending on its closeness to allied stones. This result is a stronger tendency by the AI to place beginning stones in the middle, where more areas are influenced, and to connect stones. It is known that connecting stones makes both stones much harder to capture.
After the parent heuristic was completed, I needed to create the database structure. The two classes, Game, and Games, were essential to accomplishing this. Game was simply a collection of board states, moves, and scores determined by the heuristic function, of completed games. Games was a collection of Game objects that were sorted according to date played, the size, and the number of moves. In Games is also the critical function needed to perform data mining, searchForMove(). In searchForMove(), I called boards with a similar number of moves to be compared using the compareBoards() function.
In compareBoards(), I checked the similarity of board states by counting the number of black and white stones in a square, and giving bonus points for similar patterns. compareSquares() is called on the four corners of a board and returns a value based on the similarity of the squares.
Once these values are returned to searchForMove(), the possible board states are sorted based on both the score differential made by each move and the similarity of the board position. The best move is returned to MainAI(), to be compared with the initial parent function.
In order to encourage growth through Machine Learning, extra weight is placed on the algorithm that searches through the database over the parent heuristic function
Results:
I did three comparisons in the study. The first comparison was with the parent function versus the Machine-Learning function. In the first trial, the parent function was black and the Machine-Learning function was white. The parent function won consistently despite the Machine-Learning function changing its moves slightly. In the next evolution, with the parent function being white, and the Machine-Learning function being black, the Machine-Learning function consistently beat its parent function, and even changed to beat it in a different way. After several repeat trials, there was little change in how the function evolved.
When switching back to the original pairings, with Machine-Learning being white, and the parent function being black while retaining the database logs, the Machine-Learning function ‘learned’ how to avoid being killed completely by white, though it never gained ground in beating it.
Pink shows the trained machine-learning function while blue shows the initial function without experience.
The second study was on the growth of technique in 5x5 board to 7x7 boards. In a straight 7x7 match with no experience, the Machine-Learning function (white), lost completely for the first two rounds, and then learned by itself to retain about half the board. However, a 7x7 match with 5x5 experience (winning experience when Machine-Learning was black), won the game in the first try by a little more than half the board. After one game of 7x7 experience, the Machine-Learning function failed utterly losing the entire board. For subsequent evolutions, the same result occurred, though I noticed a strong structure in the early game play (but then the late game play failed).
The final study was the effect of human movements in 5x5 games. I inputed several human to computer games and found out that when evolutions had stunted, the addition of human games encouraged different routes of movement, many of which led to victories. In all cases, human input made some difference.
Discussion:
One interesting finding of the Machine-Learning algorithm was that it inherited game winning techniques. When placed in a better situation, it learned how to win, and then make better of worse situations such as in the example where the Machine-Learning function started off as black. In most cases, I found that machine learning, when based upon its parent function, causes degenerate performance over time. When playing around with various constants, I found out that by increasing the weight of database-backed moves, the 5x5 Machine-Learning function, which though previously would win consistently with black, learned how to lose with black. The introduction of human input was the only way to stop consistent losses.
It was also interesting to see how the 5x5 game evolved to the 7x7 game. After the first winning game, I could see a definite structure to the Machine-Learning function as it winning by a huge margin. However, it wasn’t able to finish off the job, and slowly lost its place.
Conclusion and Recommendations:
Machine-Learning algorithms add much more randomness to the algorithm than deterministic heuristic functions. The performance of the Machine-Learning function varied quite a bit while the performance of the parent heuristic function remained nearly constant. The souring of evolution that occurred when the Machine-Learning function was faced with the same opponent again and again can be compared to fish in a pond, where homogeneity in gene structure leads to crisis and non-adaptability. When human games were added to the database, the AI was able to perform a lot more variety of moves when placed in different situations.
There were many flaws in my program due to time constraints that I think could be improved upon. The major flaw in the machine-learning was that it did not look past the first move in terms of heuristic scoring. Even a 2-3 ply search would yield vastly greater results, especially in Go, where stones can be captured altering the entire game. Another area that could be improved upon is the function that compares boards. Instead of using a 2x2 square, a 3x3 square would have much more accurate comparisons.
Further areas of research should go into the area of testing out the evolution of the Machine-Learning heuristic function with a variety of different heuristic functions. It would be interesting to see how the Machine-Learning function adapts to various circumstances rather than the same opponent.
References/Sources:
“Evolving a Roving Eye for Go.” http://nn.cs.utexas.edu/downloads/papers/stanley.gecco04.pdf
“Computer Go: an AI Oriented Survey.” http://www.ai.univ-paris8.fr/~cazenave/CG-AISurvey.pdf
“Garry Kasparov.” http://en.wikipedia.org/wiki/Gary_Kasparov
“The Many Faces of Go.” http://www.smart-games.com/manyfaces.html
http://research.microsoft.com/displayArticle.aspx?id=1062
http://en.wikipedia.org/wiki/Go_%28board_game%29
http://www.aaai.org/AITopics/html/go.html
http://www.newscientist.com/article.ns?id=dn6914
http://www.cs.dartmouth.edu/~brd/Teaching/AI/Lectures/Summaries/learning.html#Definitions
http://www.scism.sbu.ac.uk/inmandw/review/ml/review/rev6542.html
Share with your friends: |