Clustering Semantically Equivalent Words into Cognate Sets in Multilingual Lists

Word lists have become available for most of the world’s languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets. Our method incorporates a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages. The output of the classification algorithm is then used to generate cognate groups. The results of the experiments on word lists representing several language families demonstrate the utility of the proposed approach.

Acknowledgements

We thank Eric Holman, Søren Wichmann, and other members of the ASJP project for sharing their cognate-annotated data sets. We also thank Shane Bergsma for insightful comments. Format conversion of the Comparative Indo-European Database was performed by Qing Dou. This research was partially funded by the Natural Sciences and Engineering Research Council of Canada.

Clustering Semantically Equivalent Words into Cognate Sets in Multilingual Lists

Latest Research Papers

Basic and Depression Specific Emotions Identification in Tweets: Multi-label Classification Experiments

Weakly-Supervised Questions for Zero-Shot Relation Extraction

Updating displayed data visualizations according to identified conversation centers in natural language commands

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence