Faculty of mathematics and computer science

Download 207.28 Kb.

Page	1/2
Date	06.08.2017
Size	207.28 Kb.
	#27161

1 2

T page / page abstract / thesis able of Contents
CHAPTER II

“BABEŞ-BOLYAI” UNIVERSITY OF CLUJ-NAPOCA

FACULTY OF MATHEMATICS AND COMPUTER SCIENCE

GABRIELA ŞERBAN
Development methods for Intelligent Systems

Abstract of the Ph.D. Thesis

Supervisor

Prof. univ. dr. MILITON FRENŢIU
2003

T
page / page
abstract / thesis

able of Contents

INTRODUCTION	...........................................................................................................	2 / 5
CHAPTER I	INTELLIGENT SYSTEMS..............................................................	5 / 9
CHAPTER II	MATHEMATICAL MODELS FOR INTELLIGENT LEARNING AGENTS..........................................................................	7 / 65
CHAPTER III	REINFORCEMENT LEARNING..............................................	9 / 75
CHAPTER IV	NEW APPROACHES IN INTELLIGENT SYSTEMS......................................................................................................	11 / 107
	4.1 A new real time learning algorithm (RTL) ..................	11 / 107
	4.2 A new Logic Architecture for Intelligent Agents (ALSO) ..................................................................................................	12 / 115
	4.3 A new reinforcement learning algorithm (URU) .....	13 / 123
	4.4 A new algorithm for words sense disambiguation...	16 / 134
CHAPTER V	IMPLEMENTATIONS OF INTELLIGENT SYSTEMS......................................................................................................	18 / 141
	5.1 A reinforcement learning Intelligent Agent based on Adaptive Dynamic Programming.................................	18 / 142
	5.2 An Intelligent Agent based on training Hidden Markov Models..................................................................................	18 / 147
	5.3 A game playing Reinforcement Learning Intelligent Agent ("Agent 21").............................................	19 / 152
	5.4 "Know-How" Agent.......................................................................	20 / 152
	5.5 A real-time learning Intelligent Agent..............................	20 / 153
	5.6 An agent for words disambiguation...................................	21 / 159
	5.7 A Web Agent to display a tree structure.........................	22 / 168
	5.8 A Logic-based Agent (ALSO) .................................................	23 / 172
	5.9 An URU reinforcement learning Intelligent Agent.	23 / 174
	5.10 An Agent for words clustering.............................................	23 / 181
CHAPTER VI	A PROGRAMMING INTERFACE FOR REINFORCEMENT LEARNING SIMULATIONS.....	24 / 184
CONCLUSIONS	................................................................................................................................	28 / 199
REFERENCES	................................................................................................................................	29 / 200

INTRODUCTION
This PhD. thesis is the result of my researches in the field of Artificial Intelligence, particularly in Machine Learning, started in 1998.

The purpose of this abstract is to present an overview of the most important aspects of the thesis, including the original results.

Artificial Intelligence, one of the most recent disciplines appeared in the field of computer science, tries to understand the intelligent entities. But, unlike philosophy and psychology, which deal with the study of intelligence, too, the aim of Artificial Intelligence is building and as well understanding intelligent entities.

As nobody can predict the future in detail, computers endowed with a degree of human intelligence (or more than that) would have a great impact in everyday life and in the future course of civilization.

So, the field of machine learning represents one of the most recent challenges of the Artificial Intelligence, opening new perspectives for research in the field, and, of course, a way for achieving intelligent systems.

In our opinion, the most challenging field of Machine Learning is Reinforcement Learning, dealing with the problem of improving the behavior of an artificial agent based on feedback of its performance.

On the other hand, we consider that one of the most suitable fields for exploiting the intelligence is Natural Language Processing. This is why our researches are directed, especially, to the fields of Reinforcement Learning and Natural Language Processing.

The thesis contains 133 references and is structured on six chapters, as follows.

Chapter I, Intelligent Systems, discusses general aspects related to Artificial Intelligence. The field of Intelligent Agents is presented, a domain of Artificial Intelligence that offers a new problem solving method and a new way for interaction between user and computer. Some of the most important (in our opinion) abstract and concrete architectures for intelligent agents are presented, as well as the new programming paradigm, agent oriented programming. In the last but one section of the first chapter the representation based on logic and the reasoning in agent based systems is discussed. There are presented aspects related to the Software Engineering and the formal aspects of the agent based systems. The last section of the first chapter tackles with the problem of machine learning, as an extremely important research direction in Artificial Intelligence. The general model of a learning agent, aspects related to the learning of a field, as well as the most important learning strategies are presented.

Chapter II, Mathematical Models for Learning Intelligent Agents, presents three mathematical models for agents that learn: Markov Decision Processes, Partial Observable Markov Decision Processes and Hidden Markov Models. In our opinion, the mathematical models described in this chapter have a very important role in modeling and designing agent based systems, especially systems that learn by reinforcement (by interaction with the environment).

Chapter III, Reinforcement Learning, tackles with the extremely complex problem of reinforcement learning, a learning approach that combines two disciplines in order to solve problems that neither one could solve independently: Dynamic Programming and Supervised Learning. This chapter presents the particularities of Reinforcement Learning. There are discussed: the problem of learning based on a utility function, the passive or active learning in known or unknown environments, the problem of learning based on temporal differences (the most interesting variant of reinforcement learning), the problem of Q-learning. The most important action selection mechanisms, as well as the SARSA algorithm, the most efficient variant for Q-learning, are comparatively discussed.

Chapter IV, New approaches in Intelligent Systems, presents a few new learning algorithms (approaches of Reinforcement Learning and learning algorithms that may be used for Natural Language Processing) as well as a new logic architecture for intelligent agents (showing the importance of logical approaches for intelligent agents). The learning algorithms described in this section are: an algorithm for words sense disambiguation, a real time learning algorithm, a reinforcement learning algorithm with application to problems of motion for robots in environments with obstacles. Each of the original approaches from this chapter are comparatively discussed with already existent approaches, the advantages of the new approaches being illustrated.

Chapter V, Examples of Intelligent Systems, presents practical applications (concrete architectures for intelligent agents that improve their behavior by learning), i.e. implementations of the theoretical aspects presented in chapters II and IV. For each such application we developed experiments that illustrate the efficiency of the approaches. We also suggested directions for generalization and for future researches.

Chapter VI, A programming interface for reinforcement learning simulations, presents a standard programming interface for reinforcement learning simulations in a known environment, both the simulation of reinforcement learning based on utility function and the simulation of Q-learning being possible. The use of the interface on an example of movement of an embodied robotic agent in an environment with obstacles, is, as well, illustrated.

From the original elements presented through out this thesis we mention the following:

the use of Hidden Markov Models as mathematical models for Intelligent Learning Agents [SER01a] (section 2.3);
algorithm for words sense disambiguation [SER03e] (section 4.4);
a real time learning algorithm - RTL [SER02a] (section 4.1);
the convergence theorem for the RTL algorithm [SER02a] (Theorem 4.1.1, section 4.1);
a new Logic Architecture for Intelligent Agents - ALSO [SER02c] (section 4.2);
the reinforcement learning algorithm URU [SER03a] (section 4.3);
the convergence theorem for the URU learning algorithm [SER03a] (Theorem 4.3.1, section 4.3);
Theorems 4.3.2 and 4.3.3: the equilibrium equation of the utilities in the URU algorithm and, respectively, the error estimation in the URU algorithm [SER03a] (section 4.3);
a reinforcement learning agent based on dynamic adaptive programming [SER00b] (section 5.1);
intelligent agent based on training Hidden Markov Models [SER00c] (section 5.2);
a game playing reinforcement learning intelligent agent [SER01b] (section 5.3);
"Know-How" agent based on logic [SER02b] (section 5.4);
real time learning agent [SER02d] (section 5.5);
intelligent agent for words disambiguation [SER03d] (section 5.6);
Web agent for displaying a tree structure [SER01e] (section 5.7);
agent based on logic (ALSO) [SER02c] (section 5.8);
an URU reinforcement learning intelligent agent [SER03a] (section 5.9);
agent for words clustering [SER03b] (section 5.10);
a programming interface for reinforcement learning simulations [SER03c] (chapter VI).

* *

I would like to express my gratitude to my supervisor, Prof. Dr. Militon FRENŢIU, for his competent guidance, suggestions and encouragement.

I would like to express my thanks to Assoc. Prof. Dr. Doina TĂTAR from the Department of Computer Science of the Faculty of Mathematics and Computer Science in Cluj-Napoca for her support along all the elaboration period of the thesis and for our extremely good cooperation.

I also want to give my thanks to Assoc. Prof. Dr. Horia POP from the same department for all his competent recommendations.

I am grateful to all the members of Computer Science Department for their helpful suggestions.

CHAPTER I

INTELLIGENT SYSTEMS
Artificial Intelligence (AI) tries to understand intelligent entities. Therefore, a reason for studying this field is to learn about ourselves.

One question that AI has to answer is the following: how is it possible that a small and slow, biological or electronic brain, to perceive, understand, predict and manipulate a world much larger and more complicated than itself. All the researchers should look themselves in the mirror in order to see an example of an intelligent system.

There are two fundamental research directions in Artificial Intelligence [DUM99].

On one hand, the achievement of machines capable to reproduce the cognitive functions of the human mind, the so-called cognitive modeling. In this way, the cognitive psychology is one of the cognitive sciences whose method are taken over by Artificial Intelligence, in fields as Natural Language Processing, learning, perception. Such intelligent machines were named in the specific literature as “artificial brains”, the basic idea being building intelligent machines starting from the models of the neuronal activity. Such models are known as artificial neuronal networks. These research directions in Artificial Intelligence define the field of Computational Intelligence (Intelligent Calculus), its main characteristic being the numerical representation of knowledge.

On the other hand, the achievement of programs that allows the symbolical processing of information (knowledge) using logical methods, the so-called logical-symbolical approach (rational thinking). The logical-symbolical tradition of Artificial Intelligence tries to achieve intelligent systems using deduction mechanisms in order to obtain a solution of the problem, based on the problem's description in the logical formalism. This field of the traditional Artificial Intelligence is based on a symbolic representation of the knowledge and on logical inferences on knowledge.

All the AI problems could be gathered under the concept of intelligent agent. The AI goal is to describe and build agents that receive perception from the environment and executes actions.

An agent is an entity that perceives its environment by sensors and acts on this environment by actions.

An agent is described by:

the architecture part, the so-called behavior of the agent - the action accomplished after a perceptual sequence;
the program part, the so-called internal part of the agent - how the agent's inside works.

The goal of AI is to achieve the program part of the agent: a function that implements the mapping from perception to action.

The abstract architectures of intelligent agents described in this chapter are purely reactive agents, and agents with states. The concrete architectures presented are logic based architectures, reactive architectures, Belief-Desire-Intention architectures and layered architectures.

For many AI researchers, the idea of programming computer systems in terms of mental notions such as belief, desire, and intention is the key component of agent-based computing. The concept was articulated by Yoav Shoham, in his agent-oriented programming (AOP) proposal [SHO93]. Thus, Yoav Shoham proposes a “a new programming paradigm, based on a new point of view on the computational process”.

AOP may be regarded as a kind of ‘post-declarative’ programming. In AOP, the idea is that, as in declarative programming, we state our goals, and let the built-in control mechanism figure out what to do in order to achieve them. So, it is obvious that, when specifying such systems, we have to use logical statements that may be formalized using a logic of knowledge.

“Intelligent agents are ninety-nine percent computer science and one percent AI” [ETZ96].

The technology of intelligent agents and multi-agent systems seems set to radically alter the way complex, distributed, open systems are conceptualized and implemented. The logic based representation and reasoning in agent based systems is an important topic of the field, with formal methods having an important role [SER02b].

There are two problems related to the formalization of the agent based systems:

how the formal methods could be used for describing and reasoning about agents and their behavior, and

how formal methods provide a way to show that the implemented agent systems are correct with respect to their specifications.

The field of intelligent agents is related to another field of AI, the field of machine learning. Machine learning represents the study of system models that, based on a set of training data, improve their performance by experiences and by learning some specific experimental knowledge.

The attempt of modeling the human reasoning leads to the concept of intelligent reasoning [FLA94]. Reasoning is the process of conclusion deduction; intelligent reasoning is a kind of reasoning accomplished by humans. Most of the AI systems are deductive, able to make inferences (draw conclusions), given their initial or supplied knowledge, without being able of new knowledge acquisition or to generate new knowledge.

CHAPTER II

MATHEMATICAL MODELS FOR INTELLIGENT LEARNING AGENTS
Formally, an agent based system is represented as a 4-tuple

, where

is the space of the agent's states (the environment),

is the space of the agent's actions (the agent's behavior), h is the transition function between the states, and

is an array representing the utility function of the agent (

is the utility of the state

Most of the computational models of reinforcement learning are based on the assumption that the environment the agent interacts with can be modeled as a Markov Decision Process, defined as follows:

both the agent and the environment may be modeled as finite syncronized automata (with a finite number of states);
the agent and the environment interact in a discrete time;
the agent could perceive the environment's states and use them in order to select its actions;
after the agent acts, the environment makes a transition to a new state;
the agent receives a reward after each action performed.

We say that the environment is a Markov Decision Process (satisfies the Markov property) if the environment's response (signal) at the moment t+1 only depends on the state and action at the moment t.

It is well known that the Markov Decision Processes are very important in the reinforcement learning theory (they represent 90% of modern reinforcement learning).

In many situations the environment cannot not be “labeled” beforehand with appropriate states, and the environment's states are not completely accessible to the agent. Thus, the environment is continuous and partial observable. Partially Observable Markov Decision Processes were developed specifically to deal with this problem. The use of mathematical statistic-methods represents a leading topic in the field of Artificial Intelligence.

In the last section of this chapter we present how Hidden Markov Models may be used as a mathematical tool for modeling the environment of intelligent agents [SER01a].

The Hidden Markov Model (HMM) is a generalization of Markov decision processes, more transitions (arrows) from a state for a given action being possible. Therefore, in an HMM we may have more paths covered for the same sequence of actions, which implies that (which is the probability to have as input a sequence made up of n actions, , shortly written as ) is computed as the sum of the probabilities on all the possible paths. The probability on a given path is computed by multiplying the probabilities on each segment (transition) of that path.

Definition. An HMM is a 4-tuple , where S is a (finite) set of states, is the initial state, A is a set of input symbols (actions), and P is a set of transitions (labeled arrows).

A transition is defined as a 4-tuple: , representing passing from stateto state for input (action) , transition evaluated as having the probability p. As for a given input sequence we have more possible paths, the sequence of states it has been passed through is not deductible from the input, but hidden (this gives the name of the model we focus on).

In the last section of this chapter, we present the way intelligent agents can be modeled using a Hidden Markov Model and the way they can be trained using Viterbi's algorithm [JUR00] and Baum-Welch algorithm [JUR00].

In our opinion, Hidden Markov Models are very appropriate to be used in fields as Natural Language Processing, writing and speech recognition, and also in motion problems for robots in environments with perturbations [SER01a].

CHAPTER III

REINFORCEMENT LEARNING
Reinforcement learning is learning the mapping from situations to actions in order to maximize a scalar reward or a reward signal (function). The learning agent does not know a priori what actions to perform, as in most of learning cases, but it has to discover alone what actions will lead to a maximum reward and try these.

In most interesting and challenging cases, the actions could affect not only the immediate reward, but all the subsequent situations, and consequently all the subsequent rewards. These two characteristics - trial-error searches and delayed rewards are the most important particularities of reinforcement learning.

Reinforcement Learning can be considered, at the same time, a new but a very old topic in Artificial Intelligence. The term was first used by Minsky (1961) [MIN61] and independently in control theory by Waltz and Fu (1965) [WAL65].

The first relevant learning research in this field is the checkers player of Samuel (1959) [SAM59], which uses learning based on temporal differences to handle the rewards. Of course, learning and reinforcement were studied by psychologists for more than a century, and these researches had a great impact on AI and engineering activities.

In this chapter, we present the way agents can learn in much less generous environments, where the agent receives no examples, and starts with no model of the environment and no utility function.

Reinforcement Learning (RL) is the way to improve the behavior of an agent, given a certain feedback about its performance.

Reinforcement Learning [HAR00] is an approach to machine intelligence that combines two disciplines to successfully solve problems that neither discipline can address individually. Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control. However, traditional dynamic programming is limited in the size and complexity of the problems it can address.

Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions. However, supervised learning requires sample input-output pairs from the function to be learned. In other words, supervised learning requires a set of questions with the right answers.

Unfortunately, we don’t often know the correct answers that supervised learning requires. For these reasons there has recently been much interest in a different approach known as reinforcement learning (RL). Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question. Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems. Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment.

In a reinforcement learning problem, the agent receives a feedback, known as reward or reinforcement; the reward is received at the end, in a terminal state, or in any other state, where the agent has exactly information about what he did well or wrong.

A reinforcement learning problem has three fundamental parts [HAR00]:

the environment – represented by “states”. Every RL system learns a mapping from situations to actions by trial-and-error interactions with a dynamic environment. This environment must at least be partially observable by the reinforcement learning system;

the reinforcement function – the “goal” of the RL system is defined using the concept of a reinforcement function, which is the exact function of future reinforcements the agent seeks to maximize. In other words, there exists a mapping from state/action pairs to reinforcements; after performing an action in a given state the RL agent will receive some reinforcement (reward) in the form of a scalar value. The RL agent learns to perform actions that will maximize the sum of the reinforcements received when starting from some initial state and proceeding to a terminal state;

the value (utility) function – explain how the agent learns to choose “good” actions, or even how we might measure the utility of an action. Two terms are defined: a policy determines which action should be performed in each state; a policy is a mapping from states to actions. The value of a state is defined as the sum of the reinforcements received when starting in that state and following some fixed policy to a terminal state. The value (utility) function would therefore be the mapping from states to actions that maximizes the sum of the reinforcements when starting in an arbitrary state and performing actions until a terminal state is reached.

CHAPTER IV

NEW APPROACHES IN INTELLIGENT SYSTEMS
In this chapter we present some original approaches of Intelligent Systems: new learning algorithms (for reinforcement learning, learning algorithms with use in Natural Language Processing) and a new logic architecture for Intelligent Agents (showing the importance of logic based approaches in this field).
4.1 A new real time learning algorithm (RTL)
All Artificial Intelligence problems require some sort of search [WEI99], and thus search represents an important issue in Artificial Intelligence. Search algorithms are useful for problem solving by intelligent (single or multiple) agents.

In this section we propose an original algorithm (RTL) [SER02a], which extends the Learning Real-Time A* (LRTA*) algorithm [KOR90], used for solving path-finding problems.

This algorithm preserves the completeness and the characteristic of LRTA* (a real-time search algorithm), providing a better exploration of the search space.

LRTA* is a learning version of the A* algorithm [KOR92], a standard search algorithm in Artificial Intelligence. As we all know, the major problem of A* is, the large computational complexity.

LRTA* has two major problems:

What happens if the search space is a cyclic graph?
What happens in some plateau situations - states with more successor (neighbor) states having the same minimum h' value (the estimated distance from the current state to the goal state)? The choice of the next action is nondeterministic.

RTL proposes alternatives for solving these problems.

The proposed solutions are:

We maintain a record of the visited nodes, but we do not save the h' values for each node.
In order to choose the next action in a given state, the agent determines the set S of states not visited by the agent and with a minimum h’ value. If S is empty, the training fails, otherwise, the agent chooses a state from S using an action selection mechanism (this allows a better exploration of the search space).

The idea of our algorithm is the following:

through repeated training episodes, the agent tries some paths (possible optimal) to a goal state, and memorizes the shortest;

the number of trials is selected by the user;

after a training trial there are two possibilities:

the agent reaches a goal state; in this case the agent saves the path and it's cost;

the learning process fails (the agent does not reach the final state, because it was blocked).

Theorem 4.1.1. [SER02a] If the values of the heuristic function (the initial h’ values) are admissible (does not overestimate the real values), then the RTL algorithm is convergent.

At the end of this section we give some directions for future research concerning the RTL algorithm.

4.2 A new Logic Architecture for Intelligent Agents (ALSO)
The logic approach is a topic of Symbolic Artificial Intelligence and has its own importance for intelligent agents, even if the controversy between the traditional approach and intelligent calculus in Artificial Intelligence is well known.

Moreover, the only intelligence requirement we generally make for the agents is that [WOO97] they can make an acceptable decision about what action to perform next in their environment, in time for this decision to be useful. Other requirements for intelligence will be determined by the domain in which the agent is applied: not all agents will need to be capable of learning, for example.

In such situations, a logic architecture is very appropriate, and offers, in our opinion, a simple and elegant representation for the agent's environment and desired behavior.

The most important disadvantage of the traditional logic architecture is the large complexity of the computational process.

In this section we present a new logic architecture for intelligent agents (ALSO - a logic architecture based on stacks of goals) [SER02c]. This architecture combines the traditional logic architecture with a planning architecture [RIC91].

We consider the case of an agent whose goal is to solve a given problem (pass from an initial to a final state), based on a set of operators (rules) that may be applied on a given moment.

In an ALSO architecture, we will use the declarative representation of knowledge.

Each operator will be described by a list of new predicates that are made true by the operator and a list of old predicates that are made false by the operator. The two lists are ADD, respectively DELETE. Moreover, for each operator a third list is specific, PRECONDITION, which contains all the predicates that must be true in order to apply the operator. The frame axioms are implicitly specified in ALSO. Each predicate not included in the ADD or DELETE lists of a certain operator is not affected by that operator.

The idea of the algorithm is to use a stack of goals (a unique stack that contains both goals and operators proposed for solving those goals). The problem solver is also based on a database that describes the current situation (state) and a set of operators described by the PRECONDITION, ADD and DELETE lists.

As a case study we illustrate the use of the ALSO algorithm on a maze searching problem.

The Logic Architecture based on a stack of goals improves the traditional logic architecture for intelligent agents in the following directions:

Comparatively to the traditional logic approach, where the number of the frame axioms that should be saved in order to make a correct inference is large, the proposed architecture reduces this number. In other words, the space complexity is reduced.
Because of a limited Depth First search, the time complexity is reduced, too.
The representation is very simple and elegant.

At the end of this section we give some extension possibilities of the ALSO architecture.

4.3 A new reinforcement learning algorithm (URU)
In this section we propose an original algorithm (URU - Utility-Reward-Utility) [SER03a], namely a temporal difference reinforcement learning algorithm. This algorithm may be used for path finding problems in a rectangular environment. We also prove the convergence of this algorithm.

An important problem in robotics is to generate a path between an initial and a final configuration, so that the robot could avoid some obstacles. This problem is called a robot path-finding problem.

Many approaches to solve this problem were proposed in Artificial Intelligence and Computational Geometry [BRA82], [TOR90]. All these approaches are based on planning. The time complexity of these approaches grows exponentially with the number of moving degrees of the robot [CAN88]. We consider that heuristic approaches (section 4.1) or learning approaches (section 4.3) are important, because they contribute to reducing the time complexity of the problem.

We consider the environment represented as a space of states (each state is characterized by its position in the environment - two coordinates specifying the X-coordinate, respectively the Y-coordinate of the current position). The goal of the agent is to learn to move from an initial to a final state, on a shortest path (considered as the transitions number between states).

The notations used in what follows are:

the environment, represented as a space of states: ;
: the initial, respectively the final state (the problem could be generalized for a set of final states);

: the transition function between the states, having the following meaning: if, at a given moment, from the state i the agent could move to one of the states ; a state j accessible from the state i, in other words, is called neighbor state of the state i;

the transition probabilities between a state i and any neighbor state j are .

To solve this problem, we propose a reinforcement learning algorithm, based on learning the states' utilities (values), in which the agent receives rewards from interactions with the environment.

The algorithm, based on Temporal Differences, works on the following idea:

the agent starts with some initial estimates of the states’ utilities;

during certain training episodes, the agent will experiment some possible optimal paths from si to sf, updating, properly, the states' utilities estimations;

during the training process the states' utilities estimations converge to their exact values, and, thus, at the end of the training process, the estimations will be in the neighborhood of the exact values.

We make the following notations:

U(i) - the estimated utility of the state i;

R(i) - the reward received by the agent in the state i.

We make the following specifications:

the training process during an episode has the worst case complexity O(n²), where n is the number of environment states;

in a training sequence, the agent updates the utility of the current state using only the selected successor state, not all the successors (the temporal difference characteristic).

Case study
If we consider the reward function defined as r(s)=-1 if ssf, and r(s)=0, otherwise, it is obvious that the goal of the learning process is to minimize the transitions number from the initial to the final state (the agent learns to move on the shortest path).

We consider that the environment has the following characteristics:

the environment has a rectangular form;

at any given moment, from any given state, the agent can move in only four directions: North, South, East, West.

In what follows, we state an original theorem that gives the convergence conditions of the URU algorithm.
Theorem 4.3.1. [SER03a] Let us consider the learning problem described above and which satisfies the conditions:

the initial values of the states' utilities are computed as

where d(s1,s2) represents the Manhattan distance between the states s1 and s2;

the reward factor ,

In this case, the URU algorithm is convergent (the states' utilities are convergent after the training sequence).
We denote by

the estimated utility of state i at the n-th training episode.

The proof of Theorem 4.3.1 is based on the following lemmas:

Lemma 1. [SER03a] At the n-th training episode of the agent the following inequality holds:

Lemma 2. [SER03a] At the n-th training episode of the agent the following inequality holds:

for each transition from i (isf) to j made by the agent in the current training sequence.

Lemma 3. [SER03a] The inequality holds for every and , in other words the states' utilities increase from a training episode to another.
Theorem 4.3.2 gives the equilibrium equation of the states utilities after the URU algorithm has been run.
Theorem 4.3.2. [SER03a] In our learning problem, the equilibrium equation of the states' utilities is given by the following equation:

where represents the exact utility of the state i, obtained after the URU algorithm has been run. We note by card(M) the number of elements of the set M.
Comparatively to the classical TD algorithm, we make the following remarks (which naturally come from the theoretical results given):

the states' utilities grow faster in the URU algorithm than in the TD algorithm; in other words ;

in the case of our learning problem, as we proved in Theorem 4.3.1, for (the TD algorithm), we cannot prove the convergence of the states' utilities.

Further work is planned to be done in the following directions:

to analyze what happens if the transitions between states are nondeterministic (the environment is a Hidden Markov Model [SER01a]);

to analyze what happens if the reward factor () is not a fixed parameter, but a function whose values depend on the current state of the agent.

to develop the algorithm for solving path-finding problems with multiple agents.

4.4 A new algorithm for words sense disambiguation
We introduce in this section an original algorithm (BA) [SER01c] which combines the elements of Yarowsky’s algorithm [YAR99, RES98, YAR01, YAR95] and from the Naive Bayes Classifier (NBC).

This algorithm preserves the advantage of principles of Yarowsky (one sense per discourse and one sense per collocation) with the known high performance of NBC algorithms. The algorithm learns to make predictions based on local contexts, starting from a reduced set of labeled contexts (with meanings) and from a bigger set of unlabeled contexts.

The two important characteristics of the BA algorithm are:

the set of words used as contextual features for the disambiguation - as the number of elements from this set grows, as the disambiguation's precision grows (this is experimentally shown in section 5.6);
the relation  (, W being the set of words, and being the set of parts of W, particularly a set of contexts) - using this relation grows, also, the disambiguation's precision.

CHAPTER V

IMPLEMENTATIONS OF INTELLIGENT SYSTEMS
5.1 A reinforcement learning Intelligent Agent based on Adaptive Dynamic Programming
In this section, we present an Intelligent Agent [SER00b] that learns by reinforcement using an Adaptive Dynamic Programming method (section 3.2).

We consider a simple Markov decision process with n² states. The state space can be visualized using a nxn grid. Each square represents a state. The reinforcement function is constant -1 (i.e., the agent receives a reinforcement of -1 on each transition). There are 4 actions possible in each state: North, South, East, and West. The goal states are the upper left corner and the lower right corner.

The program that implements the agent with the functionality described above has been realized in two versions: in Borland Pascal and Visual Prolog.
5.2 An Intelligent Agent based on training Hidden Markov Models
In this section we want to test a system represented as a Hidden Markov Model (section 2.3). We propose an agent for characters recognition [SER01a].

Let us the consider the example of an agent that has to recognize two characters “I” and “U”. We propose the model described in Figure 5.1.
Figure 5.1. The Hidden Markov Model
In this section we present how the Hidden Markov Model can be trained to recognize the two characters and how the recognition of other characters works.

The agent implementation has been written in Microsoft Visual C++6.0.

In the case of the proposed experiment, some future works may be:

how the proposed model could be generalized for as many characters as possible;
how the structure of the HMM could be dynamically generated;
when such probabilistic methods are more suitable than other character recognition methods;
how probabilistic methods could be combined with others (heuristics or otherwise) for obtaining a higher performance of the model;
what would happen in certain "plateau" situations (where a given entry would have the same probability in more environments).

This section aimed to emphasize another way of working on problems of training intelligent agents.
5.3 A game playing Reinforcement Learning Intelligent Agent ("Agent 21")
In this section we design an Intelligent Agent [SER01b] who learns (by reinforcement) to play the "Game 21", a simplified version of the game "Black -Jack". The algorithm used for training the agent is the SARSA algorithm (section 3.6).

We propose a game with two players: a dealer (which distributes the cards) and an agent, in fact the computer (which uses the SARSA algorithm for learning the game's rules and which plays against the dealer). The agent tries to win, obtaining a score (the sum of the cards' values) less than or equal to 21, but greater than the dealer's score.

The probabilistic nature of this game makes it an interesting problem for testing learning algorithms, even if learning of a good playing strategy is not obvious.

Learning from a teacher (the supervised learning) is not useful in this game, as the output data for a given state are unknown. The agent has to explore different actions and to develop a strategy by selecting and maintaining the actions that maximize its performance.

The purpose of this application is to study the behavior of an agent, that initially knows neither the rules nor the game's goal, and that follows the way for assimilating these goals.

After learning, the agent (the computer) wins about 54% of the rounds (Figure 5.2). This is due to the fact that “Game 21” is a random game. The agent tries to manage, as well as it is able to, the cards received from the dealer.

As a remark, even if the game has a strong probabilistic character, after the training rounds the agent plays approximately identically or maybe even better than a human player (with a random playing strategy).

Figure 5.2. Percentage of the won games for the trained agent

5.4 "Know-How" Agent
This section refers to the implementation of the “Know-how” concept (section 1.3.4) in a Multi-Agent System (system with several non-cooperative agents) [SER02b].

Because of the strong logical formalism of the “Know-How” concept, the application was written in Visual Prolog.

5.5 A real-time learning Intelligent Agent
In this section we describe an Intelligent Agent [SER02a] who learns to find the minimum number of transitions needed for moving (in a space of states) between an initial and a final configuration, using the RTL algorithm (section 4.1).

The use of A* algorithm for solving such problems is well known. The A* algorithm, an informed search algorithm has the major disadvantage of the exponential complexity of the execution time.

Thus, a learning version is very suitable, since it reduces the algorithm’s complexity.

We propose an experiment in order to illustrate the advantages of the RTL algorithm.

The application is written in Borland C and implements the behavior of an Intelligent Agent whose aim is to find the way out of a maze on a shortest path.

Comparatively, we have also used the LRTA* algorithm, and we noticed that, because of the insufficient exploration of the environment, the agent fails.

5.6 An agent for words disambiguation
In this section we describe an Intelligent Agent [SER03e] who learns to disambiguate a polysemic word (to determine the meaning of the word in a set of contexts).

The application is written in JDK 1.4 and uses the disambiguation algorithm BA described in section 4.4.

We made two experiments: one for the English language and one for the Romanian language. The second experiment is very significant, because for the Romanian language something similar with WordNet does not exist and the BA algorithm only requires information that can be extracted from untagged corpus.

Our aim is to use the BA algorithm for the Romanian language, to disambiguate the word poarta in some contexts obtained with an on-line corpus tool (we use a Romanian corpus).

We make the following statements:

the target word poarta has, in Romanian language, four possible meanings (two nouns and two verbs);
we test our algorithm starting with 38 contexts for the target word;
we start with four words as contextual features for the disambiguation (a single feature for each sense).

As an evaluation measure of the algorithm we use the accuracy (5.1).
number of contexts correctly solved

A = (5.1)

total number of contexts that have to be solved

The accuracy of the BA algorithm in the proposed experiment is 60%.

We realised an experimental comparison between the Naive Bayes Classifier and the BA algorithm with and without the relation  (section 4.4).

The comparative experimental results are shown in Figure 5.3. In Figure 5.3, we give a graphical representation of the accuracy/context for each algorithm. More exactly, for any given algorithm, for the i-th context we represent the accuracy (5.1) of the algorithm for the first i contexts.

From Figure 5.3, it is obvious that the most efficient is the BA algorithm with the relation  (the algorithm's accuracy grows with the number of contexts).

F
igure 5.3. Experimental comparison between the NBC and BA algorithms
We make the following remarks:

in the above experiment we grow the number of occurrences for each meaning of the target word and we remark that with two occurrences for each meaning the algorithm's accuracy grows with 10%, with three occurrences for each meaning the accuracy grows with 15%;
we increase the number of input contexts (100) and we remark that the algorithm's accuracy grows with 15%.

As a conclusion, if the number of words used as contextual features for the disambiguation and the number of contexts grows, the accuracy of the BA algorithm grows, too.
5.7 A Web Agent to display a tree structure
In this section we present an Intelligent Agent [SER01e] aimed to browse a structure of remote files. It is an agent whose environment is a soft environment, with the sensors as software functions and the actions are software actions.

5.8 A Logic-based Agent (ALSO)
In this section we describe the implementation part of a Logic Agent based on a stack of goals [SER02c], whose architecture has been presented in section 4.2.

Because the above described architecture is based on logic and because the algorithm described in section 4.2 needs backtracking to find all the solutions, the implementation was written in Visual Prolog. The declarative programming languages (as Prolog) have a built-in control mechanism which allows finding all the solutions of a problem.

We have to say that the stack (stacks) of goals that we have to create for applying the algorithm are implicitly memorised by the control strategy of Prolog (a mechanism which allows backtracking).
5.9 An URU reinforcement learning Intelligent Agent
In this section we describe an Intelligent Agent [SER03a] that learns to find the minimum number of transitions (steps) for moving between an initial and a final configuration, using the URU algorithm (described in section 4.3).

The application is written in JDK 1.4. We develop an experiment for an agent that moves in a rectangular maze, illustrating the theoretical results obtained in section 4.3.

5.10 An Agent for words clustering
In this section we describe an Intelligent Agent [SER03b] who learns to cluster [MAN99] a set of words, starting from a set of contexts, from a set of words having a maximum frequency in the given contexts.

We propose an experiment for the Romanian language; we ran the algorithm on 10000 contexts and 200 words, with very good results.

We mention that this application is used as a part of a Question Answering (QA) system for the Romanian language [SER03f].
CHAPTER VI

A PROGRAMMING INTERFACE FOR REINFORCEMENT LEARNING SIMULATIONS
In this chapter we propose an original programming interface for reinforcement learning simulations in known environments [SER03c]. Using this interface, simulations are possible both for reinforcement learning based on the states utilities and learning based on actions values (Q-learning).

The interface is realized in JDK 1.4, and is meant to facilitate software development for reinforcement learning in known environments.

There are three basic objects: agents, environments and simulations.

The agent is the learning agent and the environment is the task it interacts with. The simulation manages the interaction between agent and environment, collects data and manages the display, if any.

Generally, the inputs of the agent are perceptions about the environment (in our case states from the environment), the outputs are actions, and the environment offers rewards after interacting with it.

Figure 6.1 illustrates the interaction between the agent and the environment in a reinforcement learning task.

Reward

Perception

Action

Figure 6.1. Interaction between the Agent and the Environment
The reward is a number; the environment, the actions and perceptions are instances of classes derived from the IEnvironment, IAction and IState interfaces respectively. The implementation of actions and perception may be arbitrary as long as they are properly understood by agent and environment. The agent and the environment have, obviously, to be chosen to be compatible with each other.

The interaction between agent and environment is handled in discrete time. We assume we are working with simulations. In other words there are no real-time constraints enforced by the interface: the environment waits for the agent while the agent is selecting its action and the agent waits for the environment while the environment is computing its next state.

We assume that the agent's environment is a finite Markov Decision Process.

To use the interface, the user has to define the specialized object classes HisState, HisEnvironment and HisAgent, by creating instances for each. The agent and the environment are then passed to a simulation object (CSimulation), that initializes and interconnects them. Then, CSimulation::init() will initialize and run the simulation.

If the agent learns the states' utilities, it has to be derived from the AgentUtility class, otherwise, if it learns the actions' values (Q-learning) it has to be derived from the AgentQValues class.

RLAgent is an abstract class, the basic class for all the agents. The specific agents will be instances of subclasses derived from RLAgent.

AgentUtility is an abstract class, a subclass of the class RLAgent, being the entity defining the behavior of an agent that learns by reinforcement based on the states' utilities (the algorithm used for learning is the URU algorithm described in section 4.3 [SER03a]).

AgentQValues is an abstract class, a subclass of the class RLAgent, being the entity defining the behavior of an agent that learns by reinforcement based on actions' values (the algorithm used for learning is the SARSA algorithm [SUT98]).

IEnvironment is an interface, the basic class for all environments. The specific environments will be instances of subclasses derived from IEnvironment.

CSimulation is the basic object of the interface, that manages the interaction between the agent and the environment. It defines the heart of the interface, the uniform usage that all agents and environments are meant to conform to. An instance of the simulation class is associated to an instance of an agent and an environment at the creation moment. This is made in the constructor of the class CSimulation.

We illustrate the use of the interface on a concrete example: the motion problem of an embodied agent whose goal is to find the way out from a maze with obstacles.

The UML diagram for the interface is described below.

Interface

IEnvironment

+ isValid(s: IState): Boolean

+ initial(): IState

+ final(s: IState): Boolean

+ next(s: IState, a: IAction, r: Integer): IState

+ value(s: IState): Real

CSimulation
 a: RLAgent

 m: IEnvironment

+ CSimulation(a: RLAgent, m: IEnvironment)

+ init(alpha: Real, gamma: Real, epsilon: Real, episoade: Integer): void

+ policy(ps: PrintStream)

Abstract

RLAgent
# a: IList

# l: IList

+ abstract actions(): void

+ abstract choose(e: IElement, r: Real, epsilon: Real, m: IEnvironment): IElement

+ abstract next(s: IState, m: IEnvironment): QValues

+ abstractinitial(s: IState): IElement

+ learning(alpha: Real, gamma: Real, epsilon: Real, episodes: Integer) : void

# notVisited(s: IState): Boolean

# value(e: IElement, m: IEnvironment): Real

 appear(e: IElement): Boolean

 add(e: IElement, m: IEnvironment): void

 init(s: IState): void

 update(e: IElement, eurm: IElement, r: Integer, alpha: Real, gamma: Real): void

MyEnvironment

Interface

IList

+ add(o: Object): void

+ delete(i: Integer): void

+ dim(): Integer

+ elem(i: Integer): Object

CList
 list: ArrayList

RLAgent

Interface

IState

+ toString(): String

+ equal(s: IState): Boolean

MyState

AgentUtility

AgentQValues
 next(s: IState, r: Integer, epsilon: Real, m: IEnvironment): IElement

Abstract

IElement
# o: Boolean

# s: IState

# a: IAction
+ IElement(s: IState)

+ IElement(s: IState, a:IAction)

+ state(): IState

+ action(): IAction

+ value(): Real

+ retO(): Boolean

+ setValue(v: Real): void

+ setAction(a: IAction): void

+ setO(o: Boolean): void

+ toString(): String

+ abstract equal(e: IElement): Boolean

Interface

IAction

+ toString(): String

+ equal(a: IAction): Boolean

MyAction

Utility
+ Utility(s: IState)

+ Utility(s: IState, a: IAction)

QValues
+ QValues(s: IState)

+ QValues(s: IState, a: IAction)

MyAgent

CONCLUSIONS

The aim of this work is to be a supporting evidence to the idea that the field of agents, and especially of intelligent agents is considered an important research and development area in the field of Computer Science and of Artificial Intelligence.

On the other hand, learning problems represent an important research direction in Artificial Intelligence, machine learning, being the study of system models that, based on an input data set (training set) improve their performances by experience and by learning some specific knowledge.

Reinforcement Learning, seen as a method to solve problems of machine learning, based on new knowledge acquisition [BER91], is one of the ways that may form the basis for proving the machines' performances, being, in the same time, a field open to future researches.

Because of the potential to eliminate manual encoding of control strategies, reinforcement learning is one of the most active research domains in machine learning. Reinforcement learning in unknown environments is a current research topic in this field.

The aim of the current work is to realise a presentation of the main results related to reinforcement learning, as well as the design and implementation of intelligent agents, to offer support to understand:

why agents are considered a new important way in conceptualization and implementation of a different kind of applications;
what intelligent agents are (and what they are not) and how they are related to other programming paradigms, the object oriented programming;
why learning, and reinforcement learning, particularly, represent an extremely novel and dynamic field in Artificial Intelligence.

The work presents some new approaches in Intelligent Systems (new learning algorithms), presenting possible generalizations, open problems, opening, in this way, new research directions in Machine Learning.

The study of learning was focussed in this thesis on the model of systems with a single agent, in known environments. As future research we will consider the case of unknown environments and systems with more agents (multiagent systems).

Download 207.28 Kb.

Share with your friends:

1 2

Faculty of mathematics and computer science

T page / page abstract / thesis able of Contents

...........................................................................................................

CHAPTER II

T
page / page
abstract / thesis

able of Contents