Assaf Solomovitch Tsahi Talmor



Download 266.03 Kb.
Page3/6
Date30.04.2017
Size266.03 Kb.
#16751
1   2   3   4   5   6

Design Principles



Our leading design principals:

    • UI – maximum transparency, minimal obstruction to workflow, intuitive user interaction.

    • Generic – make our application generic and scaleable as possible – modular design and implementation, separation of interfaces and implementations.

    • Extendable – Providing infrastructure which makes plugging in and extending Phoneme™ simple.


Class Diagram




Class Diagram overview

The Phoneme class is the heart of our application. To it attached three modules, each handles different aspect of the architecture:

  1. The LowLevelKeyboard Hook and Win32API classes are responsible for interaction with the Windows system, installation of the hook and interception of keyboard presses.

The Phonetic translator is a generic class which represents the translation of one string to the other. It is responsible of the other two modules:

  1. String trie class is our database, which holds the list of words in our language’s vocabulary.

  2. The Hebrew translator, Heb2EnMap and PredefinedTranslation are all language specific code, in charge of correct translation to Hebrew.

Extending the Phoneme project should be fairly easy this way:

  • Migrating to a new language should take only the replacement of module 3, as we will describe later on.

  • Changing data base/data structure is easy and modular, changing only module 2.

  • Migrating the Phoneme application to any other platform should take the replacement of module one only.

For a complete description, see appendix B – Phoneme class documentaion

Data structure

We had to choose an adequate data structure for our application. Its use is to store the Hebrew wordlist. We had several major considerations when choosing the data structure:



  1. We are doing at runtime multiple searches for words (items) in the database – hence it’s find() functionality must be computationally efficient – O(1)

  2. our Hebrew wordlist contains about 500,000 Hebrew words – storing it in memory must be as sparse as possible.

After a small research on the matter, we have decided on the trie data structure. This data structure is common when using alphabet based data. For example, it is used in IP-routing over the internet, in indexing services such as GoogleDesktopSearch and so on.

The trie data structure is built similarly to a k-ary tree, where k is the number of letters in the alphabet. Each node has k children, and node has a boolean variable to indicate if there is a word which ends at this node (see example in figure).

Finding a word in a trie takes a complexity of O(L), L being the length of the word we are looking for – which renders it practically O(1).
Trie Data Structure: exhibiting valid words: ace, aces



Hebrew translation heuristics

Translation from phonetix Hebrew in Latin characters into “real” Hebrew in Hebrew characters was also a major part in our Phoneme system design and implementation issue.

Among the various translation heuristics we note here two major ones, which are both feasible solutions to this problem.


  • One possible solution is to come up with writing rules, meaning the user will have to learn these new rules in order that the system will understand him fully. For example, if we wish to distinguish between TAF & TET letters, we enforce the user to write one “t” as a sign of TAF, and two t’s, “tt”, to signal TET.

This solution is very easy to program and implement in a very efficient way. Its major downside is that the user must learn a complete set of rules before he can start using Phoneme.

You can find an example to such heurustics in Appendix A.



  • One of our major design backbones was easy-to-use system, so our solution was chosen a bit differently. We did not force the user to learn any rules, but rather we have tried to make the translation as intuitive as possible. First we made a few reasonable assumptions:


Heuristics 2 assumptions:


h

ה

ch

ח

t

ט,ת

tz

צ

sh

ש

c

ק

k

כ

v

ו,ב רפה

a

א,ע

These are reasonable assumptions, which can be eliminated in the future versions.

Nevertheless, a few basic ambiguities still pertain. For example, one canot intuitively know if translation of “DAVAR” is דבר or דוור. Another ambiguity is translation of ‘t’ to “tet” or “taf”, and translation of ‘a’ to “alef” or “ayin”.

In order to solve these ambiguities, we resorted to a second heuristics: searching the possible permutations of the translation in a Hebrew complete wordlist. For example, we will translate “TAMID” to both תמיד and טמיד, than we will search both translations in the worslist, and determine that the second option is not a valid word, and automatically choose the first translation as the correct one.

A more complex scenario is when both are valid words, as in the translation of DAVAR. In this case we have no alternative but to request the user to properly choose what he meant. We present the user with a GUI, which lists for him all translation options, and prompt him to choose the appropriate one.

In future versions we will implement a “learning” mechanism, which registers the user’s choices and tries to guess the correct translation. We will need a scoring mechanism to help us achieve that, aggregating several properties into it (such as frequency of word in the Hebrew language and so on).





Download 266.03 Kb.

Share with your friends:
1   2   3   4   5   6




The database is protected by copyright ©ininet.org 2024
send message

    Main page