From Knoesis wiki
Jump to: navigation, search

EmojiNet is the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji. The dataset is hosted as an open service with a REST API and is available at http://emojinet.knoesis.org/.


With the rise of social media, pictographs, better known as `emoji', have become an extremely popular form of communication. Their popularity may be explained by the typical short text format of social media, with emoji able to express rich content in a single character. Emoji are also a powerful way to express emotions or a hard to write, subtle notion effectively 1. For example, emoji are used by many Internet users, irrespective of their age. Emogi, an Internet marketing firm reports that over 92% of all online users have used emoji 2. They further report that emoji use is not simply a millennial fad, as over 65% of frequent and 28% of occasional Internet users over the age of 35 use emoji. Creators of the SwiftKey Keyboard for mobile devices report that they process 6 billion messages per day that contain emoji 3. Moreover, business organizations have adopted and now accept the use of emoji in professional communication. For example, Appboy, an Internet marketing company, reports that there has been a 777% year-over-year increase and 20% month-over-month increase in emoji usage for marketing campaigns by business organizations in 2016 4. These statistics leave little doubt that emoji are a significant and important aspect of electronic communication across the world.

In the same way that natural language is processed with sophisticated machine learning techniques and technologies 5 for many important applications, including text similarity 6 and word sense disambiguation 7, so too should emoji be subject to evaluation. Yet the graphical nature of emoji, the fact that (the same) emoji may be used in different contexts to express different senses 8, and the fact that emoji are used in all languages over the world make it especially difficult to apply traditional NLP techniques to them 9. Indeed, when emoji were first introduced, they were defined with no rigid semantics attached, which allowed people to develop their own use and interpretation 10. Thus, similar to words, emoji can take on different meanings depending on context and part-of-speech (POS). Thus, like the word sense disambiguation task in natural language processing, machines also need to disambiguate the meaning or `sense' of an emoji. In a first step toward achieving this goal, we created EmojiNet, the first machine readable sense inventory for emoji 11. EmojiNet is a resource enabling systems to link emoji with their context-specific meaning. It is constructed by integrating multiple emoji resources with BabelNet, which is the most comprehensive multilingual sense inventory available to date. The ultimate goal of the EmojiNet project is to improve the machine understandability of emoji, by providing machine processable emoji meanings. Please refer to our papers at ICWSM '17 and SocInfo '16 to understand more about how the resource is built and how it could be used to tackle emoji sense disambiguation and emoji similarity problems. EmojiNet is hosted as an open service with a REST API and is available at http://emojinet.knoesis.org/.


Faculty: Amit Sheth, Derek Doran
Graduate Students: Sanjaya Wijeratne, Lakshika Balasuriya




Press Coverage

Common Emoji Mistakes and How to Use Them the Right Way | dlvr.it Blog Article

Related Projects

Concurrent Projects

Prior Projects


We are grateful to Nicole Selken, the designer of The Emoji Dictionary and Jeremy Burge, the founder of Emojipedia for giving us the permission to use their web resources for our research. We are thankful to Scott Duberstein for helping us with setting up Amazon Mechanical Turk tasks. We acknowledge partial support from the National Science Foundation (NSF) award: CNS-1513721: "Context-Aware Harassment Detection on Social Media", the National Institute on Drug Abuse (NIDA) Grant No. 5R01DA039454: "Trending: Social Media Analysis to Monitor Cannabis and Synthetic Cannabinoid Use" and the National Institutes of Mental Health (NIMH) award: 1R01MH105384-01A1: "Modeling Social Behavior for Healthcare Utilization in Depression". Points of view or opinions in this document are those of the authors and do not necessarily represent the official position or policies of the NSF, NIDA, or NIMH.

Contact: Sanjaya Wijeratne