Have the recent conflicts in the Middle East and North Africa affected or been affected by sentiments about natural resources, such as oil? What is the relationship? This visualization attempts to study whether or not there exists a correlation between prices of commodities and global sentiments about the crisis in the region.
The specific questions we seek to answer with our visualization are:
We do not plan to asses a causal relationship between oil prices and Twitter mentions, but we feel the data can nonetheless provide interesting and useful insights.
Our target audience includes anyone with an interest in political economy, the region we cover, or just unique applications of social media data. Further, if there turns out to be a temporal connection between Twitter mention changes and commodity prices changes, the graphic may be of interest to commodity traders and financial actors significantly affected by energy prices.
Few researchers have investigated the connection between Tweets and financial markets. Bollen, et al. have aimed to connect moods on Twitter to stock market fluctuations (they are consulting for a $40 million hedge fund based on the idea). However, we haven’t been able to find research examining whether there exists a correlation between the volume of Twitter traffic on an issue and the price volatility of commodities closely related to the issue.
This concept connecting changes in oil price to changes in Twitter mentions of Middle Eastern nations emerges from literature on oil market volatility. An article from the Council on Foreign Relations writes that geopolitics have a major effect on oil market volatility. A recent article from The Economist furthers this idea in writing how the perceptions of unrest affect oil market volatility. “Two factors determine the price of a barrel of oil: the fundamental laws of supply and demand, and naked fear. Both are being tested by the violence that is tearing through Libya, the world’s 13th-largest oil exporter.” If the day-by-day changes in the volume of Tweets on Middle Eastern countries serve as a measure of perceptions of geopolitical instability, then we may find a connection between Twitter mentions and oil prices. This visualization attempts to find whether that connection exists.
There exist a few visualizations that have been tracking the relationship between Twitter and the Middle East but none that we know of that track Twitter, the Middle East, and oil market volatility. Those that track Twitter and the Middle East tend to focus on tracking mentions of Al Jazeera to developments in the Middle East. One notable exception to this rule is Al Jazeera’s own Twitter Dashboard, which does track specific mentions of country names but it does not attempt to draw any connections to volatility in the commodities market.
Our visualization is composed of two parts. The first visualization at the top of the page tracks changes in both Twitter mentions and oil prices over time. These two datasets are shown on the same chart using separate axes axes scaled such that the two can be compared visually for shape. The purpose of this graph is to highlight any trends that may emerge over time; if the shapes follow each other (or follow each other with a lag), we would find it encouraging to investigate further whether a connection potentially exists between changes in Twitter mentions and changes in oil prices.
The second visualization compares changes in Twitter mentions against changes in oil prices. The purpose of this graph is to highlight more prominently whether there exists a correlation between the two datasets. If we find that increases in Twitter mentions of a country tend to occur on the same days as increases in oil prices and drops in each tend to occur with drops in the other, then we would also find it encouraging to investigate the relationship further.
Our window of analysis is January 3 to April 14, 2011.
The first dataset consists of the trading prices of Cushing Crude Oil Futures Contract 1. This data was downloaded from the US Department of Energy.
The second consists of data on the frequency of mentions of specific Middle Eastern and North African countries on Twitter. We are grateful to Trendrr and specifically Alex Nagler for giving us access to their data and guiding us through the software. The Trendrr data for some of the countries is limited by when the company began tracking mentions of each countries; for some, they began tracking well before the beginning of our analysis window; for others, tracking began as late as March 24 when we began receiving data from Trendrr.
The average and OPEC views take the mean number of Twitter mentions of all the countries or OPEC members included in our visualization. If data was not available for a country on a specific day, it was excluded from the mean calculate for that date.
All data used in this visualization show percentage change from the previous day rather than absolute values as it provides a better sense of trends and volatility, our primary interests. For the second visualization, data for weekends and holidays were removed as markets were not open.
The DOE data came in the form of an Excel document, while the Trendrr data came in the form of JSON data extracted through the Trendrr API. We used a combination of Python and Excel to scrape and clean each data sets, formatting them into JSON row arrays. Those scripts and files have been included in the submitted code.
There are separate JSON scripts for the data on oil pricing volatility over time, changes in Twitter mentions over time, and oil pricing volatility against Twitter mentions. While there is some redundancy in data storage, we think the slightly larger memory usage can be justified by the speed and responsiveness of the visualization.
We had initially started off using the Google Visualization API but found that it was not flexible enough to accommodate some design decisions we deemed important, such as using two axes on the same graph, allowing for user choice in displaying different views, and controlling both visualizations simultaneously. jQuery and Flot allowed us greater customization of the visualization and included more pre-built plug-ins to further extend Flot’s functionality.
Figure 1: Default view
Upon starting the visualization, we display oil pricing as well as the changes in average and OPEC country-specific Twitter mentions. We feel these provide a good overview of the data, and then encourages the user to further explore the country-specific data. Users may select and de-select any number of countries to create different views.
Figure 2: Egypt view
We also include interactive features that facilitate greater understanding of specific data points, including hovering to display data values and “cross-hair” views. Upon moving the mouse over a data point, a box pops up that displays the actual value. Further, hovering over the charts triggers cross-hair lines to facilitate greater ease in reading the chart. This is especially helpful for users hoping to understand the specific changes occurring at a given point.
Figures 3 & 4: Cross-hair views
Qualitatively speaking, we notice some similarities in volatility between oil pricing and Twitter mentions. For example, Twitter mentions for the overall average and OPEC seem to peak and drop a few days before oil pricing changes. We are also able to measure quantitatively the correlation between oil volatility and Twitter mentions is displayed in the legend of the second graph, though unfortunately, none of the correlations exceed 0.57. However, if future implementations of the project were to allow for time lags, we suspect that we would see an increase in the values of the correlations.
We are grateful for the feedback from our peers. Taking their concerns into account, we decided to eliminate the map of the region that would have served little purpose other than to demonstrate where each country was located geographically and we included correlation values for the two datasets.
A major strength of our approach is that it allows for two very distinct views of the data. The first privileges the human cognitive ability to observe patterns in the data not easily captured by quantitative measures. By this, we mean that the user may more easily notice whether or not there exists a time lag in the data and whether this varies by country. The second view of the data allows for precise, quantitative conclusions about the relationship between the two datasets.
We also made good use of the Gestalt principles, focusing primarily on the use of color to distinguish between different views and to draw connections between the two visualizations. Lines connect the appropriate data points in the first graph, facilitating understanding of their temporal relationship, while we only use points in the second graph because we’re more interested in the relationship between the oil and mentions rather than the sequence of the points. Further, using lines through the “cross-hairs” to connect points on the graph to the axis helps facilitate movement of the eye when reading the charts.
We also made the visualization interactive, allowing the user to drill down into the data by hovering. We felt we did a good job of adhering to Tufte’s principles of maximizing the data-ink ratio while simultaneously implementing a clean and usable design. The selection boxes furthers the simplicity by making all the data available for interested users, but without crowding the screen by showing all of the countries’ information at once. The labels are only available on hover, keeping the screen clean as well unless the users wants the specific piece of data.
One weakness of our design is that it uses static data. This was a matter of our using multiple platforms to parse and clean our data. Future implementations of this project would do well to fully automate this process. In addition, automating the data imports would allow for easier manipulation of the data to discover potential trends that may emerge by adjusting for time lags, i.e. if spikes in oil prices occur n days after spikes in a country’s mentions on Twitter.
For further information linking oil volatility and pricing, see the Council for Foreign Relations’ “Oil Market Volatility” by Toni Johnson or The Economist’s “The Price of Fear” (no author attributed).For further information on studies linking Twitter and the stock market, see “Twitter Mood Predicts The Stock Market” by Johan Bollen, Huina Mao, and Xiao-Jun Zeng, which has found a high level of predictive accuracy between the two.