Research Post
Word lists have become available for most of the world’s languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets. Our method incorporates a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages. The output of the classification algorithm is then used to generate cognate groups. The results of the experiments on word lists representing several language families demonstrate the utility of the proposed approach.
Acknowledgements
We thank Eric Holman, Søren Wichmann, and other members of the ASJP project for sharing their cognate-annotated data sets. We also thank Shane Bergsma for insightful comments. Format conversion of the Comparative Indo-European Database was performed by Qing Dou. This research was partially funded by the Natural Sciences and Engineering Research Council of Canada.
Feb 14th 2022
Research Post
Read this research paper, co-authored by Amii Fellows and Canada CIFAR AI Chairs Osmar Zaïane,and Lili Mou, Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision
Feb 14th 2022
Research Post
Read this research paper, co-authored by Amii Fellow and Canada CIFAR AI Chair Lili Mou: Search and Learn: Improving Semantic Coverage for Data-to-Text Generation
Feb 14th 2022
Research Post
Read this research paper, co-authored by Amii Fellow and Canada CIFAR AI Chair Lili Mou: Generalized Equivariance and Preferential Labeling for GNN Node Classification
Looking to build AI capacity? Need a speaker at your event?