Faculty of mathematics and computer science
1 2
INTRODUCTION This PhD. thesis is the result of my researches in the field of Artificial Intelligence, particularly in Machine Learning, started in 1998. The purpose of this abstract is to present an overview of the most important aspects of the thesis, including the original results. Artificial Intelligence, one of the most recent disciplines appeared in the field of computer science, tries to understand the intelligent entities. But, unlike philosophy and psychology, which deal with the study of intelligence, too, the aim of Artificial Intelligence is building and as well understanding intelligent entities. As nobody can predict the future in detail, computers endowed with a degree of human intelligence (or more than that) would have a great impact in everyday life and in the future course of civilization. So, the field of machine learning represents one of the most recent challenges of the Artificial Intelligence, opening new perspectives for research in the field, and, of course, a way for achieving intelligent systems. In our opinion, the most challenging field of Machine Learning is Reinforcement Learning, dealing with the problem of improving the behavior of an artificial agent based on feedback of its performance. On the other hand, we consider that one of the most suitable fields for exploiting the intelligence is Natural Language Processing. This is why our researches are directed, especially, to the fields of Reinforcement Learning and Natural Language Processing. The thesis contains 133 references and is structured on six chapters, as follows. Chapter I, Intelligent Systems, discusses general aspects related to Artificial Intelligence. The field of Intelligent Agents is presented, a domain of Artificial Intelligence that offers a new problem solving method and a new way for interaction between user and computer. Some of the most important (in our opinion) abstract and concrete architectures for intelligent agents are presented, as well as the new programming paradigm, agent oriented programming. In the last but one section of the first chapter the representation based on logic and the reasoning in agent based systems is discussed. There are presented aspects related to the Software Engineering and the formal aspects of the agent based systems. The last section of the first chapter tackles with the problem of machine learning, as an extremely important research direction in Artificial Intelligence. The general model of a learning agent, aspects related to the learning of a field, as well as the most important learning strategies are presented. Chapter II, Mathematical Models for Learning Intelligent Agents, presents three mathematical models for agents that learn: Markov Decision Processes, Partial Observable Markov Decision Processes and Hidden Markov Models. In our opinion, the mathematical models described in this chapter have a very important role in modeling and designing agent based systems, especially systems that learn by reinforcement (by interaction with the environment). Chapter III, Reinforcement Learning, tackles with the extremely complex problem of reinforcement learning, a learning approach that combines two disciplines in order to solve problems that neither one could solve independently: Dynamic Programming and Supervised Learning. This chapter presents the particularities of Reinforcement Learning. There are discussed: the problem of learning based on a utility function, the passive or active learning in known or unknown environments, the problem of learning based on temporal differences (the most interesting variant of reinforcement learning), the problem of Q-learning. The most important action selection mechanisms, as well as the SARSA algorithm, the most efficient variant for Q-learning, are comparatively discussed. Chapter IV, New approaches in Intelligent Systems, presents a few new learning algorithms (approaches of Reinforcement Learning and learning algorithms that may be used for Natural Language Processing) as well as a new logic architecture for intelligent agents (showing the importance of logical approaches for intelligent agents). The learning algorithms described in this section are: an algorithm for words sense disambiguation, a real time learning algorithm, a reinforcement learning algorithm with application to problems of motion for robots in environments with obstacles. Each of the original approaches from this chapter are comparatively discussed with already existent approaches, the advantages of the new approaches being illustrated. Chapter V, Examples of Intelligent Systems, presents practical applications (concrete architectures for intelligent agents that improve their behavior by learning), i.e. implementations of the theoretical aspects presented in chapters II and IV. For each such application we developed experiments that illustrate the efficiency of the approaches. We also suggested directions for generalization and for future researches. Chapter VI, A programming interface for reinforcement learning simulations, presents a standard programming interface for reinforcement learning simulations in a known environment, both the simulation of reinforcement learning based on utility function and the simulation of Q-learning being possible. The use of the interface on an example of movement of an embodied robotic agent in an environment with obstacles, is, as well, illustrated. From the original elements presented through out this thesis we mention the following:
* * * I would like to express my gratitude to my supervisor, Prof. Dr. Militon FRENŢIU, for his competent guidance, suggestions and encouragement. I would like to express my thanks to Assoc. Prof. Dr. Doina TĂTAR from the Department of Computer Science of the Faculty of Mathematics and Computer Science in Cluj-Napoca for her support along all the elaboration period of the thesis and for our extremely good cooperation. I also want to give my thanks to Assoc. Prof. Dr. Horia POP from the same department for all his competent recommendations. I am grateful to all the members of Computer Science Department for their helpful suggestions. CHAPTER I INTELLIGENT SYSTEMS Artificial Intelligence (AI) tries to understand intelligent entities. Therefore, a reason for studying this field is to learn about ourselves. One question that AI has to answer is the following: how is it possible that a small and slow, biological or electronic brain, to perceive, understand, predict and manipulate a world much larger and more complicated than itself. All the researchers should look themselves in the mirror in order to see an example of an intelligent system. There are two fundamental research directions in Artificial Intelligence [DUM99]. On one hand, the achievement of machines capable to reproduce the cognitive functions of the human mind, the so-called cognitive modeling. In this way, the cognitive psychology is one of the cognitive sciences whose method are taken over by Artificial Intelligence, in fields as Natural Language Processing, learning, perception. Such intelligent machines were named in the specific literature as “artificial brains”, the basic idea being building intelligent machines starting from the models of the neuronal activity. Such models are known as artificial neuronal networks. These research directions in Artificial Intelligence define the field of Computational Intelligence (Intelligent Calculus), its main characteristic being the numerical representation of knowledge. On the other hand, the achievement of programs that allows the symbolical processing of information (knowledge) using logical methods, the so-called logical-symbolical approach (rational thinking). The logical-symbolical tradition of Artificial Intelligence tries to achieve intelligent systems using deduction mechanisms in order to obtain a solution of the problem, based on the problem's description in the logical formalism. This field of the traditional Artificial Intelligence is based on a symbolic representation of the knowledge and on logical inferences on knowledge. All the AI problems could be gathered under the concept of intelligent agent. The AI goal is to describe and build agents that receive perception from the environment and executes actions. An agent is an entity that perceives its environment by sensors and acts on this environment by actions. An agent is described by:
The goal of AI is to achieve the program part of the agent: a function that implements the mapping from perception to action. The abstract architectures of intelligent agents described in this chapter are purely reactive agents, and agents with states. The concrete architectures presented are logic based architectures, reactive architectures, Belief-Desire-Intention architectures and layered architectures. For many AI researchers, the idea of programming computer systems in terms of mental notions such as belief, desire, and intention is the key component of agent-based computing. The concept was articulated by Yoav Shoham, in his agent-oriented programming (AOP) proposal [SHO93]. Thus, Yoav Shoham proposes a “a new programming paradigm, based on a new point of view on the computational process”. AOP may be regarded as a kind of ‘post-declarative’ programming. In AOP, the idea is that, as in declarative programming, we state our goals, and let the built-in control mechanism figure out what to do in order to achieve them. So, it is obvious that, when specifying such systems, we have to use logical statements that may be formalized using a logic of knowledge. “Intelligent agents are ninety-nine percent computer science and one percent AI” [ETZ96]. The technology of intelligent agents and multi-agent systems seems set to radically alter the way complex, distributed, open systems are conceptualized and implemented. The logic based representation and reasoning in agent based systems is an important topic of the field, with formal methods having an important role [SER02b]. There are two problems related to the formalization of the agent based systems:
The field of intelligent agents is related to another field of AI, the field of machine learning. Machine learning represents the study of system models that, based on a set of training data, improve their performance by experiences and by learning some specific experimental knowledge. The attempt of modeling the human reasoning leads to the concept of intelligent reasoning [FLA94]. Reasoning is the process of conclusion deduction; intelligent reasoning is a kind of reasoning accomplished by humans. Most of the AI systems are deductive, able to make inferences (draw conclusions), given their initial or supplied knowledge, without being able of new knowledge acquisition or to generate new knowledge. CHAPTER II MATHEMATICAL MODELS FOR INTELLIGENT LEARNING AGENTS Formally, an agent based system is represented as a 4-tuple , where is the space of the agent's states (the environment), is the space of the agent's actions (the agent's behavior), h is the transition function between the states, and is an array representing the utility function of the agent ( is the utility of the state). Most of the computational models of reinforcement learning are based on the assumption that the environment the agent interacts with can be modeled as a Markov Decision Process, defined as follows:
We say that the environment is a Markov Decision Process (satisfies the Markov property) if the environment's response (signal) at the moment t+1 only depends on the state and action at the moment t. It is well known that the Markov Decision Processes are very important in the reinforcement learning theory (they represent 90% of modern reinforcement learning). In many situations the environment cannot not be “labeled” beforehand with appropriate states, and the environment's states are not completely accessible to the agent. Thus, the environment is continuous and partial observable. Partially Observable Markov Decision Processes were developed specifically to deal with this problem. The use of mathematical statistic-methods represents a leading topic in the field of Artificial Intelligence. In the last section of this chapter we present how Hidden Markov Models may be used as a mathematical tool for modeling the environment of intelligent agents [SER01a]. The Hidden Markov Model (HMM) is a generalization of Markov decision processes, more transitions (arrows) from a state for a given action being possible. Therefore, in an HMM we may have more paths covered for the same sequence of actions, which implies that (which is the probability to have as input a sequence made up of n actions, , shortly written as ) is computed as the sum of the probabilities on all the possible paths. The probability on a given path is computed by multiplying the probabilities on each segment (transition) of that path.
A transition is defined as a 4-tuple: , representing passing from stateto state for input (action) , transition evaluated as having the probability p. As for a given input sequence we have more possible paths, the sequence of states it has been passed through is not deductible from the input, but hidden (this gives the name of the model we focus on). In the last section of this chapter, we present the way intelligent agents can be modeled using a Hidden Markov Model and the way they can be trained using Viterbi's algorithm [JUR00] and Baum-Welch algorithm [JUR00]. In our opinion, Hidden Markov Models are very appropriate to be used in fields as Natural Language Processing, writing and speech recognition, and also in motion problems for robots in environments with perturbations [SER01a]. CHAPTER III REINFORCEMENT LEARNING Reinforcement learning is learning the mapping from situations to actions in order to maximize a scalar reward or a reward signal (function). The learning agent does not know a priori what actions to perform, as in most of learning cases, but it has to discover alone what actions will lead to a maximum reward and try these. In most interesting and challenging cases, the actions could affect not only the immediate reward, but all the subsequent situations, and consequently all the subsequent rewards. These two characteristics - trial-error searches and delayed rewards are the most important particularities of reinforcement learning. Reinforcement Learning can be considered, at the same time, a new but a very old topic in Artificial Intelligence. The term was first used by Minsky (1961) [MIN61] and independently in control theory by Waltz and Fu (1965) [WAL65]. The first relevant learning research in this field is the checkers player of Samuel (1959) [SAM59], which uses learning based on temporal differences to handle the rewards. Of course, learning and reinforcement were studied by psychologists for more than a century, and these researches had a great impact on AI and engineering activities. In this chapter, we present the way agents can learn in much less generous environments, where the agent receives no examples, and starts with no model of the environment and no utility function.
Reinforcement Learning [HAR00] is an approach to machine intelligence that combines two disciplines to successfully solve problems that neither discipline can address individually. Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control. However, traditional dynamic programming is limited in the size and complexity of the problems it can address. Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions. However, supervised learning requires sample input-output pairs from the function to be learned. In other words, supervised learning requires a set of questions with the right answers. Unfortunately, we don’t often know the correct answers that supervised learning requires. For these reasons there has recently been much interest in a different approach known as reinforcement learning (RL). Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question. Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems. Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. In a reinforcement learning problem, the agent receives a feedback, known as reward or reinforcement; the reward is received at the end, in a terminal state, or in any other state, where the agent has exactly information about what he did well or wrong. A reinforcement learning problem has three fundamental parts [HAR00]:
CHAPTER IV NEW APPROACHES IN INTELLIGENT SYSTEMS In this chapter we present some original approaches of Intelligent Systems: new learning algorithms (for reinforcement learning, learning algorithms with use in Natural Language Processing) and a new logic architecture for Intelligent Agents (showing the importance of logic based approaches in this field). 4.1 A new real time learning algorithm (RTL) All Artificial Intelligence problems require some sort of search [WEI99], and thus search represents an important issue in Artificial Intelligence. Search algorithms are useful for problem solving by intelligent (single or multiple) agents. In this section we propose an original algorithm (RTL) [SER02a], which extends the Learning Real-Time A* (LRTA*) algorithm [KOR90], used for solving path-finding problems. This algorithm preserves the completeness and the characteristic of LRTA* (a real-time search algorithm), providing a better exploration of the search space. LRTA* is a learning version of the A* algorithm [KOR92], a standard search algorithm in Artificial Intelligence. As we all know, the major problem of A* is, the large computational complexity. LRTA* has two major problems:
RTL proposes alternatives for solving these problems. The proposed solutions are:
The idea of our algorithm is the following:
Theorem 4.1.1. [SER02a] If the values of the heuristic function (the initial h’ values) are admissible (does not overestimate the real values), then the RTL algorithm is convergent. At the end of this section we give some directions for future research concerning the RTL algorithm. 4.2 A new Logic Architecture for Intelligent Agents (ALSO) The logic approach is a topic of Symbolic Artificial Intelligence and has its own importance for intelligent agents, even if the controversy between the traditional approach and intelligent calculus in Artificial Intelligence is well known. Moreover, the only intelligence requirement we generally make for the agents is that [WOO97] they can make an acceptable decision about what action to perform next in their environment, in time for this decision to be useful. Other requirements for intelligence will be determined by the domain in which the agent is applied: not all agents will need to be capable of learning, for example. In such situations, a logic architecture is very appropriate, and offers, in our opinion, a simple and elegant representation for the agent's environment and desired behavior. The most important disadvantage of the traditional logic architecture is the large complexity of the computational process. In this section we present a new logic architecture for intelligent agents (ALSO - a logic architecture based on stacks of goals) [SER02c]. This architecture combines the traditional logic architecture with a planning architecture [RIC91]. We consider the case of an agent whose goal is to solve a given problem (pass from an initial to a final state), based on a set of operators (rules) that may be applied on a given moment. In an ALSO architecture, we will use the declarative representation of knowledge. Each operator will be described by a list of new predicates that are made true by the operator and a list of old predicates that are made false by the operator. The two lists are ADD, respectively DELETE. Moreover, for each operator a third list is specific, PRECONDITION, which contains all the predicates that must be true in order to apply the operator. The frame axioms are implicitly specified in ALSO. Each predicate not included in the ADD or DELETE lists of a certain operator is not affected by that operator. The idea of the algorithm is to use a stack of goals (a unique stack that contains both goals and operators proposed for solving those goals). The problem solver is also based on a database that describes the current situation (state) and a set of operators described by the PRECONDITION, ADD and DELETE lists. As a case study we illustrate the use of the ALSO algorithm on a maze searching problem. The Logic Architecture based on a stack of goals improves the traditional logic architecture for intelligent agents in the following directions:
At the end of this section we give some extension possibilities of the ALSO architecture. 4.3 A new reinforcement learning algorithm (URU) In this section we propose an original algorithm (URU - Utility-Reward-Utility) [SER03a], namely a temporal difference reinforcement learning algorithm. This algorithm may be used for path finding problems in a rectangular environment. We also prove the convergence of this algorithm. An important problem in robotics is to generate a path between an initial and a final configuration, so that the robot could avoid some obstacles. This problem is called a robot path-finding problem. Many approaches to solve this problem were proposed in Artificial Intelligence and Computational Geometry [BRA82], [TOR90]. All these approaches are based on planning. The time complexity of these approaches grows exponentially with the number of moving degrees of the robot [CAN88]. We consider that heuristic approaches (section 4.1) or learning approaches (section 4.3) are important, because they contribute to reducing the time complexity of the problem. We consider the environment represented as a space of states (each state is characterized by its position in the environment - two coordinates specifying the X-coordinate, respectively the Y-coordinate of the current position). The goal of the agent is to learn to move from an initial to a final state, on a shortest path (considered as the transitions number between states). The notations used in what follows are:
To solve this problem, we propose a reinforcement learning algorithm, based on learning the states' utilities (values), in which the agent receives rewards from interactions with the environment. The algorithm, based on Temporal Differences, works on the following idea:
We make the following notations:
We make the following specifications:
Case study If we consider the reward function defined as r(s)=-1 if ssf, and r(s)=0, otherwise, it is obvious that the goal of the learning process is to minimize the transitions number from the initial to the final state (the agent learns to move on the shortest path). We consider that the environment has the following characteristics:
In what follows, we state an original theorem that gives the convergence conditions of the URU algorithm. Theorem 4.3.1. [SER03a] Let us consider the learning problem described above and which satisfies the conditions:
where d(s1,s2) represents the Manhattan distance between the states s1 and s2;
In this case, the URU algorithm is convergent (the states' utilities are convergent after the training sequence). We denote by the estimated utility of state i at the n-th training episode. The proof of Theorem 4.3.1 is based on the following lemmas: Lemma 1. [SER03a] At the n-th training episode of the agent the following inequality holds: Lemma 2. [SER03a] At the n-th training episode of the agent the following inequality holds: for each transition from i (isf) to j made by the agent in the current training sequence. Lemma 3. [SER03a] The inequality holds for every and , in other words the states' utilities increase from a training episode to another. Theorem 4.3.2 gives the equilibrium equation of the states utilities after the URU algorithm has been run. Theorem 4.3.2. [SER03a] In our learning problem, the equilibrium equation of the states' utilities is given by the following equation: where represents the exact utility of the state i, obtained after the URU algorithm has been run. We note by card(M) the number of elements of the set M. Comparatively to the classical TD algorithm, we make the following remarks (which naturally come from the theoretical results given):
Further work is planned to be done in the following directions:
4.4 A new algorithm for words sense disambiguation We introduce in this section an original algorithm (BA) [SER01c] which combines the elements of Yarowsky’s algorithm [YAR99, RES98, YAR01, YAR95] and from the Naive Bayes Classifier (NBC). This algorithm preserves the advantage of principles of Yarowsky (one sense per discourse and one sense per collocation) with the known high performance of NBC algorithms. The algorithm learns to make predictions based on local contexts, starting from a reduced set of labeled contexts (with meanings) and from a bigger set of unlabeled contexts. The two important characteristics of the BA algorithm are:
CHAPTER V IMPLEMENTATIONS OF INTELLIGENT SYSTEMS 5.1 A reinforcement learning Intelligent Agent based on Adaptive Dynamic Programming In this section, we present an Intelligent Agent [SER00b] that learns by reinforcement using an Adaptive Dynamic Programming method (section 3.2). We consider a simple Markov decision process with n2 states. The state space can be visualized using a nxn grid. Each square represents a state. The reinforcement function is constant -1 (i.e., the agent receives a reinforcement of -1 on each transition). There are 4 actions possible in each state: North, South, East, and West. The goal states are the upper left corner and the lower right corner. The program that implements the agent with the functionality described above has been realized in two versions: in Borland Pascal and Visual Prolog.
The agent implementation has been written in Microsoft Visual C++6.0. In the case of the proposed experiment, some future works may be:
This section aimed to emphasize another way of working on problems of training intelligent agents. 5.3 A game playing Reinforcement Learning Intelligent Agent ("Agent 21") In this section we design an Intelligent Agent [SER01b] who learns (by reinforcement) to play the "Game 21", a simplified version of the game "Black -Jack". The algorithm used for training the agent is the SARSA algorithm (section 3.6). We propose a game with two players: a dealer (which distributes the cards) and an agent, in fact the computer (which uses the SARSA algorithm for learning the game's rules and which plays against the dealer). The agent tries to win, obtaining a score (the sum of the cards' values) less than or equal to 21, but greater than the dealer's score. The probabilistic nature of this game makes it an interesting problem for testing learning algorithms, even if learning of a good playing strategy is not obvious. Learning from a teacher (the supervised learning) is not useful in this game, as the output data for a given state are unknown. The agent has to explore different actions and to develop a strategy by selecting and maintaining the actions that maximize its performance. The purpose of this application is to study the behavior of an agent, that initially knows neither the rules nor the game's goal, and that follows the way for assimilating these goals. After learning, the agent (the computer) wins about 54% of the rounds (Figure 5.2). This is due to the fact that “Game 21” is a random game. The agent tries to manage, as well as it is able to, the cards received from the dealer. As a remark, even if the game has a strong probabilistic character, after the training rounds the agent plays approximately identically or maybe even better than a human player (with a random playing strategy).
Because of the strong logical formalism of the “Know-How” concept, the application was written in Visual Prolog. 5.5 A real-time learning Intelligent Agent In this section we describe an Intelligent Agent [SER02a] who learns to find the minimum number of transitions needed for moving (in a space of states) between an initial and a final configuration, using the RTL algorithm (section 4.1). The use of A* algorithm for solving such problems is well known. The A* algorithm, an informed search algorithm has the major disadvantage of the exponential complexity of the execution time. Thus, a learning version is very suitable, since it reduces the algorithm’s complexity. We propose an experiment in order to illustrate the advantages of the RTL algorithm. The application is written in Borland C and implements the behavior of an Intelligent Agent whose aim is to find the way out of a maze on a shortest path. Comparatively, we have also used the LRTA* algorithm, and we noticed that, because of the insufficient exploration of the environment, the agent fails. 5.6 An agent for words disambiguation In this section we describe an Intelligent Agent [SER03e] who learns to disambiguate a polysemic word (to determine the meaning of the word in a set of contexts). The application is written in JDK 1.4 and uses the disambiguation algorithm BA described in section 4.4. We made two experiments: one for the English language and one for the Romanian language. The second experiment is very significant, because for the Romanian language something similar with WordNet does not exist and the BA algorithm only requires information that can be extracted from untagged corpus. Our aim is to use the BA algorithm for the Romanian language, to disambiguate the word poarta in some contexts obtained with an on-line corpus tool (we use a Romanian corpus). We make the following statements:
As an evaluation measure of the algorithm we use the accuracy (5.1). number of contexts correctly solved A = (5.1) total number of contexts that have to be solved The accuracy of the BA algorithm in the proposed experiment is 60%. We realised an experimental comparison between the Naive Bayes Classifier and the BA algorithm with and without the relation (section 4.4). The comparative experimental results are shown in Figure 5.3. In Figure 5.3, we give a graphical representation of the accuracy/context for each algorithm. More exactly, for any given algorithm, for the i-th context we represent the accuracy (5.1) of the algorithm for the first i contexts. From Figure 5.3, it is obvious that the most efficient is the BA algorithm with the relation (the algorithm's accuracy grows with the number of contexts). F igure 5.3. Experimental comparison between the NBC and BA algorithms We make the following remarks:
As a conclusion, if the number of words used as contextual features for the disambiguation and the number of contexts grows, the accuracy of the BA algorithm grows, too. 5.7 A Web Agent to display a tree structure In this section we present an Intelligent Agent [SER01e] aimed to browse a structure of remote files. It is an agent whose environment is a soft environment, with the sensors as software functions and the actions are software actions. 5.8 A Logic-based Agent (ALSO) In this section we describe the implementation part of a Logic Agent based on a stack of goals [SER02c], whose architecture has been presented in section 4.2. Because the above described architecture is based on logic and because the algorithm described in section 4.2 needs backtracking to find all the solutions, the implementation was written in Visual Prolog. The declarative programming languages (as Prolog) have a built-in control mechanism which allows finding all the solutions of a problem. We have to say that the stack (stacks) of goals that we have to create for applying the algorithm are implicitly memorised by the control strategy of Prolog (a mechanism which allows backtracking).
The application is written in JDK 1.4. We develop an experiment for an agent that moves in a rectangular maze, illustrating the theoretical results obtained in section 4.3. 5.10 An Agent for words clustering In this section we describe an Intelligent Agent [SER03b] who learns to cluster [MAN99] a set of words, starting from a set of contexts, from a set of words having a maximum frequency in the given contexts. We propose an experiment for the Romanian language; we ran the algorithm on 10000 contexts and 200 words, with very good results. We mention that this application is used as a part of a Question Answering (QA) system for the Romanian language [SER03f]. CHAPTER VI A PROGRAMMING INTERFACE FOR REINFORCEMENT LEARNING SIMULATIONS In this chapter we propose an original programming interface for reinforcement learning simulations in known environments [SER03c]. Using this interface, simulations are possible both for reinforcement learning based on the states utilities and learning based on actions values (Q-learning). The interface is realized in JDK 1.4, and is meant to facilitate software development for reinforcement learning in known environments. There are three basic objects: agents, environments and simulations. The agent is the learning agent and the environment is the task it interacts with. The simulation manages the interaction between agent and environment, collects data and manages the display, if any. Generally, the inputs of the agent are perceptions about the environment (in our case states from the environment), the outputs are actions, and the environment offers rewards after interacting with it. Figure 6.1 illustrates the interaction between the agent and the environment in a reinforcement learning task. Reward Perception Action Figure 6.1. Interaction between the Agent and the Environment The reward is a number; the environment, the actions and perceptions are instances of classes derived from the IEnvironment, IAction and IState interfaces respectively. The implementation of actions and perception may be arbitrary as long as they are properly understood by agent and environment. The agent and the environment have, obviously, to be chosen to be compatible with each other. The interaction between agent and environment is handled in discrete time. We assume we are working with simulations. In other words there are no real-time constraints enforced by the interface: the environment waits for the agent while the agent is selecting its action and the agent waits for the environment while the environment is computing its next state. We assume that the agent's environment is a finite Markov Decision Process. To use the interface, the user has to define the specialized object classes HisState, HisEnvironment and HisAgent, by creating instances for each. The agent and the environment are then passed to a simulation object (CSimulation), that initializes and interconnects them. Then, CSimulation::init() will initialize and run the simulation. If the agent learns the states' utilities, it has to be derived from the AgentUtility class, otherwise, if it learns the actions' values (Q-learning) it has to be derived from the AgentQValues class.
We illustrate the use of the interface on a concrete example: the motion problem of an embodied agent whose goal is to find the way out from a maze with obstacles. The UML diagram for the interface is described below.
+ isValid(s: IState): Boolean + initial(): IState + final(s: IState): Boolean + next(s: IState, a: IAction, r: Integer): IState + value(s: IState): Real CSimulation a: RLAgent m: IEnvironment + CSimulation(a: RLAgent, m: IEnvironment) + init(alpha: Real, gamma: Real, epsilon: Real, episoade: Integer): void + policy(ps: PrintStream)
Abstract RLAgent # a: IList # l: IList + abstract actions(): void + abstract choose(e: IElement, r: Real, epsilon: Real, m: IEnvironment): IElement + abstract next(s: IState, m: IEnvironment): QValues + abstractinitial(s: IState): IElement + learning(alpha: Real, gamma: Real, epsilon: Real, episodes: Integer) : void # notVisited(s: IState): Boolean # value(e: IElement, m: IEnvironment): Real appear(e: IElement): Boolean add(e: IElement, m: IEnvironment): void init(s: IState): void update(e: IElement, eurm: IElement, r: Integer, alpha: Real, gamma: Real): void
Interface IList + add(o: Object): void + delete(i: Integer): void + dim(): Integer + elem(i: Integer): Object
CList list: ArrayList
Interface IState + toString(): String + equal(s: IState): Boolean
MyState AgentUtility AgentQValues next(s: IState, r: Integer, epsilon: Real, m: IEnvironment): IElement Abstract IElement # o: Boolean # s: IState # a: IAction
+ IElement(s: IState, a:IAction) + state(): IState + action(): IAction + value(): Real + retO(): Boolean + setValue(v: Real): void + setAction(a: IAction): void + setO(o: Boolean): void + toString(): String + abstract equal(e: IElement): Boolean
Interface IAction + toString(): String + equal(a: IAction): Boolean
MyAction Utility + Utility(s: IState) + Utility(s: IState, a: IAction)
QValues + QValues(s: IState) + QValues(s: IState, a: IAction)
MyAgent CONCLUSIONS The aim of this work is to be a supporting evidence to the idea that the field of agents, and especially of intelligent agents is considered an important research and development area in the field of Computer Science and of Artificial Intelligence. On the other hand, learning problems represent an important research direction in Artificial Intelligence, machine learning, being the study of system models that, based on an input data set (training set) improve their performances by experience and by learning some specific knowledge. Reinforcement Learning, seen as a method to solve problems of machine learning, based on new knowledge acquisition [BER91], is one of the ways that may form the basis for proving the machines' performances, being, in the same time, a field open to future researches. Because of the potential to eliminate manual encoding of control strategies, reinforcement learning is one of the most active research domains in machine learning. Reinforcement learning in unknown environments is a current research topic in this field. The aim of the current work is to realise a presentation of the main results related to reinforcement learning, as well as the design and implementation of intelligent agents, to offer support to understand:
The work presents some new approaches in Intelligent Systems (new learning algorithms), presenting possible generalizations, open problems, opening, in this way, new research directions in Machine Learning. The study of learning was focussed in this thesis on the model of systems with a single agent, in known environments. As future research we will consider the case of unknown environments and systems with more agents (multiagent systems). Download 207.28 Kb. Share with your friends: 1 2 |