A.Pastuhov, A.Zaboleeva-Zotova USING NATURAL LANGUAGE CONSTRACTIONS IN THE PROGRAMMING LANGUAGES
Today, the main problem of the programming is problem of the quality of the code. Debugging takes a lot of time and money but still is not completely efficient. New programming languages, closer to the natural language, will help to reduce the number of bugs in the program. This way the programmer will have tools close to his normal way of thinking and since code will be close to the natural language text it will easier to read and understand it, so it will be easier not to make a mistake.
We are developing a new language that uses different natural language phenomena. We were trying to find necessary “tools” which will allow us to create program text close to the natural but avoid the complication of the system.
First of all every system which deals with natural language must have lexicon but it is impossible to create one universal lexicon for all of the programmers. On the other hand it will be to tough for a programmer to create a lexicon by himself since it is not an easy process. We decided to use construction-oriented lexicon rather than word-oriented. So the description of it is very close to the declarations in the normal programming language but one can use phrases and synonyms for the construction description. This way you need to concentrate on the programming architecture not on the certain words. But still it is possible for the system to get the general meaning (or just an area of usage) of the word.
The main task was to make the text of the methods look as close to the natural text as possible and as needed. From all of the natural language phenomena we chose only the most important contributing descriptive references, anaphors, coordination, synonyms, variable word forms and non-strict word succession.
The last two phenomena are important for the Russian language. It uses variable word forms so it is the morphological concordance that matters rather then word succession. So it is important for the text to meet these two requirements (variable word forms and non-strict word succession) to look natural. Non-strict word succession drives us to the non-strict token succession, so it is impossible to parse the sentence in “left-to-right” fashion.
We use term “descriptive reference” to describe object referencing by specifying its most important properties. Normally man does not give a name to the object. Name usually is not important, it is the type of the object and its most relevant “properties” that matters. So this principle can be used in the programming languages too. In the program, especially when you work with array or list of objects, it is not the name or the index, but object’s properties will tell you information you need to know. So if you use descriptive reference you there is no need for the indexes and loops working with array. All you need is the description of the object you want to access. Using this principle you can even access several objects simultaneously. This tool is very useful and it helps to shorten the text and makes it easier to read and understand.
Anaphora is another phenomenon often used in the natural language to minimize its redundancy. It consists of an object (or objects) nomination (this part of construction is called antecedent) and anaphoric referencing (anaphor). That anaphoric referencing addresses to the object (or objects) in terms of the special expression, which describe it. There are three main types of anaphoric expression: this, same and other (see Figure 1.) and the same thing for the multiple objects. First expression addresses the same object that was nominated earlier. Second one means that you must access different object but with same properties as original one has. And the last one means you must find an object different from the first one. Of course anaphoric expression includes may also include some additional information for more specific object description. As the matter of fact it uses descriptive reference instrument for it, so it can be view just as another type of it (so that the type is not written explicitly but is derived from some previous information).
Antecedent
it
Antecedent
same
Antecedent
other
Figure 1. Anaphora examples
There are two types of nomination in the natural language – explicit and implicit (see Figure 2.) If the first case you have an explicit description of the object or the reference to an object in the text. This information can be easily obtained form the text analyses and is located before the anaphoric reference. In the other case there is no direct mentioning of an object and anaphoric reference can be applied only to the consequent of some previous information. With this type of the nomination it is harder to identify the original object (quasiantecedent). Concerning the programming languages these two types of nomination can be treated as object reference (by name, index or descriptive reference) in the first case and object reference inside the function call in the second case.
Object reference
Anaphoric expression
Function call
Anaphoric expression
Antecedent
Anaphor
Object reference
Quasiantecedent
Anaphor
Figure 2. Explicit and implicit nomination.
But all this cannot be easily achieved without using of coordination. This is another piece of natural language, which was not widely used in formal languages. But it plays a great role in our everyday speech. It helps to avoid unnecessary information duplication. Usually, when you want to call different methods of one object, or common methods of several objects, you have to specify the method or the object in every call. Operations like with…do can help to avoid repetitive object naming, but cannot do a thing in case of repetitive method calls. The real solution, which is widely used in natural language, is constituents coordination. Just like in natural language, you can specify common component only once and then name all other constituents. This is very useful in case of reference – descriptive or anaphoric. You can write descriptions using commas, conjuncts of all types (and, or, either…or) to produce full specification.
We are implementing all this features in our new programming language, based on natural Russian language. Of course this language is too complicated to be described by simple context-free grammar. We are applying 3-levels grammar for the full description. First level is for the natural language used in the system (in our case – Russian). Since we are not working with exact word meanings the semantic component of the grammar is rather small. We include only universal information concerning conjunction, numbers and so on. Third level is used for the formal programming language. The middle one is for universal phenomena like descriptive reference or conjunction and is used to translate these constructions into the sentences of simple formal operators. Island-driven algorithm is used for parsing. We tried to minimize usage of the first level, because it is the most uncertain part of the language description, that is why different tools, like statistical text analysis or user “behavior” analysis, were added.
This new language uses different natural language constructions to make the program text easier to read and to understand. It results in work speed-up and increasing quality, since it is easier to analyze and debug program. Natural language and programming language descriptions are isolated from each other and this system can be adopted to any natural language, without changing formal programming language constructions, and vice versa. So it becomes possible to create truly multilingual application development environment.
Share with your friends: |