Towards Automated Language Classification: a clustering Approach Armin Buch, David Erschler, Gerhard Jäger, and Andrei Lupas



Download 1.74 Mb.
Page13/15
Date05.05.2018
Size1.74 Mb.
#48097
1   ...   7   8   9   10   11   12   13   14   15

Figure 8. Symmetrized Cebuano-Danish alignment.

Fig. 9.
We would like to achieve symmetrization nonetheless, and therefore devise a general strategy. If two words are mutually linked, or not linked at all, no action needs to be taken, as this is already symmetric. Every unidirectional link is either to be deleted or to be turned into a bidirectional one. A simple criterion shall decide: Keep the link if and only if it is the only one to connect (at least) one of the words involved. This minimizes unaligned as well as multiply aligned words, which is meant to capture the intuition that one-to-one alignments are linguistically desirable (as also underlies GIZA++). It leads to the above mentioned correction of the Cebuano-Danish example. For the other example, the result is much less chaotic and linguistically more sound,



Figure 9. Malagasy-Esperanto alignment.

Fig. 10.



Figure 10. Symmetrized Malagasy-Esperanto alignment.

In the latter example, a certain notion of transitivity is violated because both instances of "ny" do not connect with "super" although indirectly they are connected (disregarding the fact that this alignment is linguistically undesired; as usual, GIZA++ has difficulties with articles). Other criteria when to keep a link and when to delete it might resolve this situation (and others) differently. For the present purposes, the one described above suffices.





        1. Download 1.74 Mb.

          Share with your friends:
1   ...   7   8   9   10   11   12   13   14   15




The database is protected by copyright ©ininet.org 2024
send message

    Main page