T he lower the value of pi is, the higher is the chance that the similarity between vi and wi is non accidental. Assuming that similarities among different pairs of potential cognates are independent, we take the product of pi’s for all meanings out of the 40 for which we have data. Let P denote this product.
Now, we define the similarity SL’L’’ between L’ and L’’ as -log(P) (the minus sign serves to make the thing positive). The values SL’L’’ serve as the input for CLANs.
Figure 5. Indo-European language cluster with respect to the Word Similarity measure.
The method we use might look suspiciously similar to Greenberg’s (1987) ‘mass comparison”, justly criticized by many authors, for a detailed discussion and reference see, for example, Campbell and Poser (2008). The crucial difference between our approach and Greenberg’s mass comparison is that, unlike in Greenberg’s work, the similarity between words is established by an algorithm and not a human. That makes results considerably more reproducible (as long as the same initial dataset is used.)
Share with your friends: |