Scraping Twitter for Ophelia

I think Ophelia might just be the word of the year in Ireland. Outside a select group of Shakespeare aficionados and the three people in Ireland who know someone actually called Ophelia, it’s not a word that crossed many lips prior to October 2017.

The breeze that brought us together

There were more significant news stories this year, no doubt, but few can be summarised right down to just a single word, as Ophelia can. Certain stories are popular with specific demographics – bitcoin was the talk of the town for the more tech-savvy, for example, but it probably didn’t penetrate everyone’s information bubble. If the Irish population are unified by anything though, it’s the weather. In short, we love a good gust.

Ophelia – or, better known by her full name, Hurricane Ophelia – devastated Ireland earlier this year. Storms have come and gone but she will always stand out, being best remembered as the one for which we all got a day off. I wanted to measure the impact of this story in Ireland so I looked to Twitter (naturally).

Ophelia making landfall in Ireland as an extratropical cyclone on 16 October. (Courtesy of NASA)

Tweet tweet

My goal here was to make a plot of the amount of tweets that mentioned Ophelia over time. The first task was to pull all tweets from Twitter that includes the word Ophelia. Twitter provide an API to make our lives a little easier, but I actually went with a better piece of software for the job, written by Ahmet Taspinar and available through GitHub, called Twitter Scraper. To find and save every tweet over all time that mentioned Ophelia would take way too long so I limited my search to October, since that’s when the storm was a brewin’. I’m running Ubuntu 16.04 so to install the software and search Twitter for Ophelia between 01 October 2017 and 31 October 2017, I ran the following in my terminal:

$ sudo pip install twitterscraper
$ twitterscraper ophelia -bd 2017-10-01 -ed 2017-10-31

When the query finished, it outputted a file called tweets.json which contained all of the delicious data. I used Python – specifically a Jupyter Notebook, which I think is the best way to use Python. I’m not going to post all of the code here but I will mention a few things. The first point I’ll address is how to import the data. These three lines, as stated in the Twitter Scraper documentation, do the trick:

import codecs, json
with'tweets.json', 'r', 'utf-8') as f:
____tweets = json.load(f, encoding = 'utf-8')

The other thing worth mentioning is how to make the plot look pretty. The standard plotting library in Python in Matplotlib and, while vanilla Matplotlib looks a little dull, there is huge room for customisation. I could play around with all of the possibilities but there is a shortcut: Seaborn. Seaborn is a Python visualization library based on Matplotlib that “provides a high-level interface for drawing attractive statistical graphics”. This module changes the default Matplotlib plotting parameters so that graphs look nice from the start. Using it requires minimal input:

import seaborn as sns

With the tweets in Python, it was then just a matter of doing whatever you wanted to the data. If you’d mined Twitter for Trump, for example, you could look what fraction of the tweets were positive versus negative (by analysing the other text in the tweets) in order to gauge his popularity. For me and Ophelia, I just took the timestamps from each tweet and found the total number of tweets per day.

The final result is shown below. The hurricane formed on 09 October (which shows a marked rise versus the previous days) and dissipated on 20 October. On 16 October, the number of tweets jumped to more than 120,000. This was the day it landed in Ireland, after being demoted to an extratropical cyclone. The country basically shut down for the day so people stayed at home tweeting. Without further ado…

This shows the number of tweets per day which contained the word Ophelia in October 2017.