Journal of information, knowledge and research in computer engineering



Download 97.83 Kb.
Page3/3
Date02.05.2018
Size97.83 Kb.
#47215
1   2   3

C. SQL constructing component

The SQL constructing component consists of three parts; SQL Generator, database adaptor, and SQL executor.



1) SQL generator

The elements of the natural query of the SQL is mapped by the SQL generator. The SQL generator uses four routine, each of which controls only one specific part of the query. The overall SQL statement is constructed from the concatenation of the output of the four routines [16]. The first routine selects the part of the natural language query that corresponds to the appropriate DML command with the attributes' names (i.e. SELECT * clause). The second routine selects the part of the query that would mapped to a table's name or a group of tables' names to construct the FROM clause. The third routine selects the part of the query that would be mapped to the WHERE clause (condition). The fourth routine selects the part of the natural language query that corresponds to the order of displaying the result (ORDER BY clause with the name the).



2) Database adaptor

Many different database management systems exist today and a database adapter can be used to control the variety of interfaces and techniques of these different DBMS. Database connection, constraint, data type, and SQL format are examples of such varieties.



3) Database executor

The task of SQL executor is to get the required results from the used database. To achieve this, the generated SQL statement would be tested to verify correctness before applied to the used database and then represent the result to the user.



D. Syntactical knowledge base

The syntactical knowledge base of the GINLIDB system is used by the linguistic component to determine the accepted words, provide word alternatives (in spelling correction process), and to verify the natural language query grammar.



E. Semantic knowledge base

This knowledge base consists of the English semantic grammar (grammar rules) and the schema of the database in use. The semantic knowledge base is used to replace words and/or phrases semantically by equivalent words and/or phrases that are recognized by our system (according to the system capabilities).



F. Knowledge extension

This component extends the syntactical knowledge base by the adding new words and the semantic knowledge base by adding new rules. The component enlarges the system to accommodate variant domains and to strengthen the terminology and rules of existing domains.



5.1 Design of the GINLIDB System

The system is designed and implemented by the use of Object Oriented (OO) techniques. The Unified Modeling Language (UML) is an evolutionary general-purpose [17], broadly applicable, tool-supported, and industry-standardized modeling language, used to design this system. The various UML diagrams are:

Use case diagrams are to conceptualize the functionality of the system through the systems' cases that represent different overall system scenarios.

Sequence diagrams are used to show the interactions among different elements of the system in the shape of passing messages from and to each object. Sequence diagrams depict the internal behavior of the GINLIDB system.

Class diagram is used to describe the static view of the system by describing the classes and relationships among them.

Activity diagrams are used to capture the flow from one activity to the next.



6. THE ADVANTAGES AND DISADVANTAGES OF NATURAL LANGUAGE INTERFACE TO DATABASES

The following section discusses the advantages and disadvantages of the Natural Language Interface to databases systems [1, 5]: -



6.1 The Advantages

a) No Artificial Language

One advantage of NLIDBs is that it allows a naïve user who does not have knowledge of artificial communication language to query the database without learning the artificial language. Formal query languages like SQL are difficult to learn and master, at least by non-computer-specialists.



b) Simple, easy to use

Consider a database with a query language or a certain form designed to display the query. While an NLIDB system only requires a single input, a form-based may contain multiple inputs (fields, scroll boxes, combo boxes, radio buttons, etc) depending on the capability of the form. In the case of a query language, a question may need to be expressed using multiple statements which contain one or more sub- queries with some joint operations as the connector.



c) Better for Some Questions

There are some kind of questions (e.g. questions involving negation, or quantification) that can be easily expressed in natural language, but that seem difficult (or at least tedious) to express using graphical or form-based interfaces [1, 5]. For example, “Which department has no programmers?” (Negation), or “Which company supplies every department?” (Universal quantification), can be easily expressed in natural language, but they would be difficult to express in graphical or form-based interfaces. They can be expressed using the query language but it would require large complex queries which can be only written by the computer experts.



d) Fault tolerance

Most of NLIDB systems provide some tolerances to minor grammatical errors, while in a computer query language the syntax and the rules of the language must be obeyed, and any errors will cause the input automatically be rejected by the system.



e) Easy to Use for Multiple Database Tables

Queries that involve multiple database tables like “list the address of the farmers who got bonus greater than 10000 rupees for the crop of wheat”, are difficult to form in graphical user interface as compared to natural language interface.



6.2 The Disadvantages

a) Linguistic coverage is not obvious

Currently all NLIDB systems can only handle some subsets of a natural language and it is not easy to define these subsets. Some NLIDB systems are not able to provide answers of questions belonging to their own subset which is not the case in a formal language [5]. The formal language coverage is obvious and any statements that follow the given rules are guaranteed to give the corresponding answer.



b) Linguistic vs. conceptual failures

In case of NLIDB system failures, it is usually seen that the system does not provide any explanation as to what caused the system to fail. Some users try to rephrase the question or just leave the question unanswered. Most of the time, it is left for the user to fend for the cause of errors.



c) False expectations

People can be misled by an NLIDB system’s ability to process a natural language: they may assume that the system is intelligent [5]. Therefore many a times it is seen that rather than asking precise questions from a database, the user’s may be tempted to ask questions that involve complex ideas, certain judgments, reasoning capabilities, etc. for which an NLIDB system cannot be relied upon.



7. CONCLUSIONS

Research is done from the last few decades on Natural Language Interfaces. With the advancement in hardware processing power, many natural language interface to databases got promising results. The system accepts an English language requests that is interpreted and translated into SQL command using semantic grammar technique. In addition, the system requires a knowledge base that consists of a database and its schema. The result of the number of experiments in the form of trials in a user friendly environment had been very successful and satisfactory. To improve the system performance in natural language processing various issues like enriching the knowledge sources of the system in order to increase the system efficiency and researching methods to improve the coherence and the fluency of output texts must be considered. From the experiments we can seen it is possible to translate a natural language query to SQL, and a probabilistic approach may be promising. So far, our NLDBI system considers selection and a few simple aggregations. The next step of our research is to optimize the PCFG, to accommodate more complex queries.



8. REFERENCES

  1. Natural Language Interface using Shallow Parsing, Rajendra Akerkar and Manish Joshi. International Journal of Computer Science and Applications, Vol. 5, No. 3, pp 70 – 90

  2. Miikkulainen R., “Natural language processing with subsymbolic neural networks”, Neural Network Perspectives on Cognition and Adaptive Robotics.

  3. Shashtri L., “A model of rapid memory formation in the hippocampal system”, Proceeding of Meeting of cognitive Science Society, Stanford

  4. Natural language Interface for Database: A Brief review, Mrs. Neelu Nihalani, Dr. Sanjay Silakari, Dr. Mahesh Motwani. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 ISSN (Online): 1694-0814

  5. Androutsopoulos, G.D. Ritchie, and P. Thanisch, Natural Language Interfaces to Databases – An Introduction, Journal of Natural Language Engineering 1 Part 1 (1995), 29–81

  6. Johnson Mark. PCFG Models of Linguistic Tree Representations. 24(4): 613-631, 1998.

  7. A probabilistic corpus-driven model for lexical-functional analysis. In Proc. COLING-ACL’98.

  8. Dan Klein, Christopher D. Manning: Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency. ACL 2004: 478-485.

  9. M-C.de Marneffe, B. MacCartney, and C. D. Manning. “Generating Typed Dependency Parses From Phrase Structure Parses”. In Proceedings of the IEEE /ACL 2006 Workshop on Spoken Language Technology. The Stanford Natural Language Processing Group. 2006.

  10. Dan Klein and Christopher D. Manning. 2003. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press, pp. 3-10.

  11. Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006.

  12. Interactive Natural Language Interface, Faraj A. El-Mouadib, Zakaria Suliman Zubi, Ahmed A. Almagrous, I. El-Feghi. WSEAS Transactions on Computers, ISSN: 1109-2750 661 Issue 4, Volume 8, April 2009

  13. Hendrix, G., Sacrdoti, E., Sagalowicz, D. and Slocum, J. (1978). Developing a natural language interface to complex data. ACM Transactions on Database Systems, Volume 3, No. 2, USA, Pages 105 – 147.

  14. Woods, W. (1973). An experimental parsing system for transition network grammars. In Natural Language Processing, R. Rustin, Ed., Algorithmic Press, New York.

  15. Woods, W., Kaplan, R. and Webber, B. (1972). The Lunar Sciences Natural Language Information System. Bolt Beranek and Newman Inc., Cambridge, Massachusetts Final Report. B. B. N. Report No 2378.

  16. Generic Interactive Natural Language Interface to Databases (GINLIDB) Faraj A. El-Mouadib, Zakaria S. Zubi, Ahmed A. Almagrous, and Irdess S. El-Feghi. International Journal of Computers Issue 3, Volume 3, 2009

  17. K. Hamilton and R. Miles “Learning UML 2.0. O'Reilly”, ISBN-10: 0- 596-00982-8, 2006.

  18. Wermter S., “Hybrid approaches to neural network-based language processing”, Technical Report TR-97-030, International Computer Science Institute.

  19. Bei-Bei Huang, Guigang Zhang, PhillIp C-Y Sheu, A Natural Language Database Interface Based On a Probabilistic Context Free Grammar IEEE International Workshop on Semantic Computing and Systems, 978-0-7695-3316-2/08 $25.00 © 2008 IEEE DOI 10.1109/WSCS.2008.14

ISSN: 0975 – 6760| NOV 10 TO OCT 11 | VOLUME – 01, ISSUE - 02 Page


Download 97.83 Kb.

Share with your friends:
1   2   3




The database is protected by copyright ©ininet.org 2024
send message

    Main page