Psychidemic Hackathon: Estimation of Mental Health Patterns in COVID19 Tweets and Emojis
During this pandemic, when in-person communication is difficult, people use a wide variety of emojis and text to convey their emotions, sentiment, fear, and psychological impact. Ignoring these emojis while determining the mental health status of a person from his/her published tweet could be misleading. Similarly, this pandemic has seen a change in the sense of Emoji usage on Twitter. For instance, folding hands were meant to please or thank you; however, in the pandemic context, it is used to express a health crisis. Hence, a semantic fusion of these modalities of data is required to precisely identify the user behavior and the patterns in the mental health statuses. This challenge seeks innovative methods to not only determine the semantic association between tweet texts and Emojis in the COVID19 context but also utilize it to label the tweet with an appropriate mental health condition.
The purpose of this challenge is to solicit submissions that show the effective use of hierarchical knowledge structures (e.g., EmojiNet, Wikidata, DBpedia, etc.) as supplementary sources to comprehend noisy social media data. Such a task is considered challenging because online contents are ambiguous, noisy, redundant, and sparse in the information. For example, it would be difficult for a purely statistical technique to deal with different sentiments in the text and Emojis, exacerbated by a lack of semantic knowledge about the entities and relationships, available in Knowledge Graphs (KG), implicitly contained in the data and therefore not amenable to statistical derivation. Furthermore, these resources show potential in providing the capability of semantic mixing to traditional statistical learning methods by exploiting different modalities of data in online written communications. Notably, in this challenge, we are focusing on the interplay of semantics in the text and the Emojis to facilitate robust prediction of emojis associated with some mental health-related online content but also provide the most suitable mental health disorder.
Example of Knowledge Graphs used in similar tasks:
❏ ConceptNet : ConceptNet is a freely-available semantic network, designed to help computers understand the meanings of words that people use. [http://conceptnet.io/c/en/download]
❏ EmojiNet : EmojiNet is the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. The dataset is hosted as an open service with a REST API and is available at [https://www.kaggle.com/rtatman/emojinet]
❏ DBpedia : The DBpedia project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web. [https://wiki.dbpedia.org/develop/datasets, https://www.dbpedia-spotlight.org/api]
❏ Wikidata : Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. It is a free and open knowledge base that can be read and edited by both humans and machines. [https://dumps.wikimedia.org/wikidatawiki/entities/]
The challenge is targeted toward undergraduate and graduate students, data science practitioners in industry/non-profits, and public health researchers interested in the broad areas of the semantic web (e.g., knowledge graphs), natural language processing/understanding, information retrieval, machine learning, and social computing. As mentioned above, student researchers in the areas, especially in the sphere of social good applications, would find the discoveries fundamentally contributing towards improving the combined semantic understanding of text and emojis in messy online discourse. Furthermore, the tasks and the methodologies explicitly built for the task can be integrated into AI assistant models to prevent misclassification of a potential case of mental-health affliction.
Description of the Dataset and Challenge
The dataset consists of a total of ~50000 unique tweets containing emoji, extracted from the 8M COVID-19 related tweets. Along with the tweet text, the dataset also includes features such as emoji present in the tweets. We have divided the dataset into a train and test with a split ratio of 80-20. Along with the features mentioned above, the test dataset also consists of Depression, Addiction, and Anxiety labels associated with each tweet. For easy integration into the machine learning pipeline, resources such as emoji2vec can be used to get vectorized representations .
Challenge Task 1: Using and Understanding text and emoji sentiments together in predicting mental-health behavior of social media users
Definition: The task requires analysis of a dataset containing text and emojis from social media, to predict mental-health behavioral categories in users. The user-behaviors can be classified under the categories listed below. They can also have mixture categories. There has
been prior work on mining opinions, and it is essential to distinguish a particular case of these categories against views on them .
Input: “<My life is the absolute worst 😊, Depression>”
Output: “Depression: 1.0, Anxiety: 0.0, Addiction: 0.0”
Evaluation: Involve medical practitioners to check related categories in output, standard metrics of Precision, Recall, and F1-score for data label categories.
Challenge Task 2: Ranking tweet texts using emoji sentiment data, within mental-health behavior contexts
Definition: Emojis have attached with them a sentiment that they convey in social media data. This sentiment is not always immediately discernible from the emoji alone and requires additional context. Therefore, text and emoji Knowledge Graphs can help understand the emoji’s contextual view to retrieve further and rank tweet texts that align with the contextual sentiment of the emoji.
For example, “My life is the absolute worst 😊”, would need the model to understand that the contextual emoji sentiment is actually negative and not positive to be able to retrieve text with negative behavior. The label in the dataset on the emoji “😊“, if written by a depressed user will be negative and this can confuse a purely statistical learning approach
Input: “<😊 , negative>”
Output: “My life is the absolute worst 😊”: <high_score, if depression user>
Evaluation: Standard Precision, Recall, and F1-score metrics.
To access the dataset participants need to fill their details in G-Form.
After submission of the form, you will receive an email with Training (unlabeled) and Test Dataset (labeled).
Optional: Here is the link to 8 Million twitter conversations on COVID19 for training a problem-specific embedding model. [8M GDrive]
❏ Challenge Code Submission: 25th September 2020
❏ Results announcement: 1st October 2020
❏ Challenge Paper Submission: 20th October 2020
Submissions must contain Python code, executables, and data. Submissions can be made using this dropbox link. Please do not submit python notebooks. Top 2 results from the challenge will be selected for publication at the KGSWC proceedings.
In case of any questions or concerns please reach out to either:
Vedant Khandelwal: email@example.com
Vishal Pallagani: firstname.lastname@example.org
Kaushik Roy: email@example.com
Manas Gaur: firstname.lastname@example.org
- Speer, Robert, and Catherine Havasi. “Representing General Relational Knowledge in ConceptNet5.”InLREC, pp.3679-3686.2012.
- Vrandečić, Denny, and Markus Krötzsch. “Wikidata: a free collaborative knowledgebase.” Communications of the ACM 57, no. 10 (2014): 78-85.
- Lehmann, Jens, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, et al. “DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia.” Semantic web 6, no. 2 (2015): 167-195.
- Wijeratne, Sanjaya, Lakshika Balasuriya, Amit Sheth, and Derek Doran. “Emojinet: Building a machine-readable sense inventory for emoji.” In International conference on social informatics, pp. 527-541. Springer, Cham, 2016.
- Karthik, Valmeekam, Dheeraj Nair, and J. Anuradha. “Opinion Mining on Emojis using Deep Learning Techniques.” Procedia computer science 132 (2018): 167-173.
- Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524(2017).
- Eisner, Ben, et al. “emoji2vec: Learning emoji representations from their description.” arXiv preprint arXiv:1609.08359(2016).