"using name and residence location to predict aboriginal ethnicity in canada"
Kai Wong is a PhD student in the School of Public Health at the University of Alberta. He is supervised by Amii's Yutaka Yasui
Abstract: Ethnicity is an important variable in epidemiological research. Canada is an ethnically diverse country, yet its insufficiency of ethnicity data impedes research progress and policy development in public health domains. This data gap negatively affects Aboriginal Canadians in particular as their health inequality may remain hidden. Automated name- and location-based ethnicity classification has shown potential but its applicability within the Canadian context is largely unknown.
Methods: Our study applied machine learning (with regularized logistic regression (LR), support vector machines (SVM), and decision trees (DT) classifiers) to predict various levels of Aboriginal ethnicity (i.e., Aboriginal (all inclusive); First Nations, Métis, and Inuit; and major Aboriginal language and tribal groups) using name and residence location features derived from the Canadian Census 1901. Name features included the entire names, substrings, double-metaphones, and various name-entity characteristics. Data was randomly split into training and validation sets. The classification performance was evaluated on accuracy, ROC, sensitivity, specificity, PPV, and NPV.
Results: The highest performance was obtained for predicting Aboriginal (all inclusive), First Nations, Algonquian, and Kootenay ethnic groups with accuracy ranged between 0.99-1.00, ROC 0.99-1.00, sensitivity 0.63-0.65, specificity at 1.00, PPV 0.78-0.86, and NPV 0.99-1.00 in the validation sets. The classification performance for the remaining Aboriginal identities varied widely, primarily due to most subgroups having a small sample size. In general, residence location appeared to be an important feature in addition to the name features for predicting Aboriginal ethnicity in Canada.
Conclusions: This is the first study that examined automated Aboriginal ethnicity classification using both name and residence location features. We have shown that certain Aboriginal ethnic groups in Canada can be reasonably accurately identified using the name features alone and both name and location features. Effective implementation of this approach could give rise to new research, program, and policy developments targeting the grave inequality Aboriginal Canadians suffer.
ai seminar series
Fridays at noon, Amii and the Department of Computing Science host AI Seminars, engaging presentations on topics in the broad field of artificial intelligence. With speakers from the University of Alberta and other world-leading groups, the talks give AI enthusiasts a friendly way of engaging with the latest trends and topics in research and development.
Seminars are open to the public, and no registration is required, though seating is limited and on a first-come-first-served basis. Topics range from foundational theoretical work to innovative applications of artificial intelligence technologies.
If you would like to present at an upcoming AI Seminar, please contact Colin Bellinger.
Join the AI Seminar mailing list to stay up-to-date on all the latest presentations