Quantifying Depressed Social Media During COVID-19: Information Retrieval With ML & NLP

Abstract

The ongoing pandemic continues to disrupt the normal functioning of society in numerous ways, and symptoms of depression are on the rise. In this work, we explored how analysis of social media can reveal changes in the number of authors presenting depressive symptoms on social media using Twitter and Reddit.

We first assessed the level of depressive symptoms expressed in a large set of tweets. While there are some efforts for identifying depressive symptoms in tweets, they are limited in scope and typically do not account for contemporary online discourse surrounding the experience of depression. To ensure that our assessment accounted for contemporary discourse, we extracted recent posts from /r/Depression, where symptoms and experience are a main topic of discussion. To further ensure that our assessment accounted for language that expresses depressive symptoms in a variety of contexts, rather than only when explicitly discussing the experience of depression, we also extracted all of the other Reddit posts of users who posted in /r/Depression. These user posts were extracted from all posts made by all authors in /r/Depression across all of Reddit for November and December 2019 (the most recent two months available in their entirety on Pushshift).

We then trained a GloVe word embedding on the posts made by users across Reddit who post in /r/Depression. Using the resulting word vectors, we then trained an author representation using the usr2vec method for both our /r/Depression authors and a sampled set of users to act as contrast against our archetypal example. This produces a high-dimensional representation of a user, based on a composite of the word representations we trained previously. Then, we used a linear kernel support vector machine (SVM) to find a separating hyper-plane between these high dimensional representations of users who post in /r/Depression and the control set not active in /r/Depression. From here, we could use the SVM to directly classify unseen user representations; however, this is prone to bias, the classifications are challenging to explain, and training a representation for every new user is computationally expensive. We instead extracted vocabulary strongly associated with users who post in /r/Depression by taking the cosine of every word representation in the vocabulary of our word embedding with the decision direction the SVM produces. We took the most aligned words and used them to form a query for retrieving content written by depressed users. These words can be visualized and reviewed, mitigating bias and improving explainability. We call this method `Archetype-based Information Retrieval' (AIR); our work is an example of using AIR to find depression-associated content, based on a similar approach for finding posts about substance abuse. %(aligned)

We created a query from the 200 most closely aligned-words and used BM25 to assign a score to tweets from the Mega-COV and official Twitter COVID-19 datasets. We took the top-scoring quartile of tweets from our search as being posts that indicate depressive symptoms. We sorted the tweets by the time they were posted, and looked for changes in the frequency of high-scoring matches to our query over time. We then ran topic models (Latent Dirichlet Allocation, Contextual) on tweets grouped by the month they were posted in and looked for consistencies and changes over time in the topics discovered by these automated approaches.

Future work will explore ties between social media metrics and traditional, offline metrics. We intend to group tweets by geotags and look for corresponding trends; it is an open question whether the local, municipal, provincial, federal or international situation regarding COVID-19 forms the primary stressors on individuals. This study lays the foundation for AIR as a tool for investigating COVID-19 impacts on mental health.

Quantifying Depressed Social Media During COVID-19: Information Retrieval With ML & NLP

Abstract

Latest Research Papers

Basic and Depression Specific Emotions Identification in Tweets: Multi-label Classification Experiments

Weakly-Supervised Questions for Zero-Shot Relation Extraction

Updating displayed data visualizations according to identified conversation centers in natural language commands

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence