Towards Automated Language Classification: a clustering Approach Armin Buch, David Erschler, Gerhard Jäger, and Andrei Lupas



Download 1.74 Mb.
Page4/15
Date05.05.2018
Size1.74 Mb.
#48097
1   2   3   4   5   6   7   8   9   ...   15

Figure 3. Geography of the language sample.

n this way, features which contain much information about the genetic affiliation of languages receive a high weight (and vice versa). This decision was motivated by the hope to extract a deep genetic signal from the WALS data.

The resulting cluster map (see Fig. 2) shows a circular structure. There are two large clusters of languages at opposite sides of the circle (shown in gray and black), and a third, smaller cluster (shown in white) in between. The other languages are arranged somewhere on the circle between these three regions without forming distinct groups.

The map on Fig 3 shows the geographic distribution of respective languages (colors on the map match the colors on Fig. 2).2

A manual inspection of this outcome reveals that this cluster map captures a strong typological and a somewhat weaker areal signal, but no usable information about genetic affiliations. The cluster shown in grey contains languages with head-initial basic word order (SVO or VSO), small phoneme inventories, and lack of case marking. The black cluster, on the other hand, is characterized by head-final word order, nominative-accusative alignment both for pronouns and full NPs, a large number of cases (mostly more than 6) and predominant dependent marking. Figure 2 shows that these groupings are neither genetically nor areally motivated.

That perfectly well agrees with the findings of Greenhill et al (2011) and Donohue et al (2011): The distribution of morphosyntactic features does not sufficiently well reflect genetic relationships between languages.

I




Download 1.74 Mb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   15




The database is protected by copyright ©ininet.org 2024
send message

    Main page