Before I get too far I don’t actually analysis taco emojis. At least not yet. I, however, give you the tools to start parsing them from tweets, text or anything you can get into Python.
This past month Apple released their iOS 9.1 and their latest OS X 10.11.1 El Capitan update. That updated included a bunch of new emojis. I’ve made a quick primer on how to handle emoji analysis in Python. Then when Apple released an update to their emojis to include the diversity, I updated my small Python class for emoji counting to include to the newest emojis. I also looked at what is actually happening with the unicode when diversity modifier patches are used.
Click for Updated socialmediaparse Library
With this latest update, Apple and the Unicode Consortium didn’t really introduce any new concepts, but I did update the Python class to include the newest emojis. In my GitHub the data folder includes a text file with all the emojis delimitated by ‘\n’. The class uses this file to find any emoji’s in a unicode string which has been passed to the add_emoji_count()
method.
Building off of the diversity emoji update, I added a skin_tone_dict
property of the EmojiDict class. This property returns a dictionary with the number of unique human emojis per tweet and their skin tones. This property will not catch multiple human emojis written if they in the same execution of the add_emoji_count()
method
import socialmediaparse as smp #loads the package counter = smp.EmojiDict() #initializes the EmojiDict class #goes through list of unicode objects calling the add_emoji_count method for each string #the method keeps track of the emoji count in the attributes of the instance for unicode_string in collection: counter.add_emoji_count(unicode_string) #output of the instance print counter.dict_total #dict of the absolute total count of the emojis in corpus print counter.dict #dict of the count of strings with the emoji in corpus print counter.baskets #list of lists, emoji in each string. one list for each string. print counter.skin_tones_dict #dictionary of unique emoji emojis aggregated by the counter. #print counter.skin_tones_dict output #{'human_emoji': 4, '\\U0001f3fe': 1, '\\U0001f3fd': 1, '\\U0001f3ff': 0, '\\U0001f3fc': 2, '\\U0001f3fb': 1} counter.create_csv(file='emoji_out.csv') #method for creating csv
Above is an example of how to use the new attribute. It is a dictionary so you can work that into your analysis however you like. I will eventually create better methods and outputs to make this feature more robust and useful.
The full code / class I used in this post can be found on my GitHub .