Towards Automated Language Classification: a clustering Approach Armin Buch, David Erschler, Gerhard Jäger, and Andrei Lupas

Download 1.74 Mb.

Page	8/15
Date	05.05.2018
Size	1.74 Mb.
	#48097

1 ... 4 5 6 7 8 9 10 11 ... 15

Thus, the similarity is 1 if the words are identical and 0 if they are totally different.

Now consider the similarity value

for a specific potential cognate pair

. (Now these are two words with a same meaning!) By itself, this value is not very telling. What we want to estimate, is how likely it is for a random pair of words from the two languages to have the same (or higher) similarity value. We estimate this probability,

, as the number of pairs with the similarity greater or equal to

, divided by the overall number of pairs.

Download 1.74 Mb.

Share with your friends:

1 ... 4 5 6 7 8 9 10 11 ... 15