CS 171 Visualization

Final Project: New York Times Most Emailed Articles

Leslie Vogt

Motivating Questions

My motivating questions stemmed from curiosity about the way that certain articles on the New York Times web site spent a lot of time at the top of the "most emailed" list. I wanted to know what trajectory an article took to reach the top: how quickly does an article ascend the rankings? How long does it stay there? These questions also caused me to ask: what sections are more likely to have articles in the top ten emailed list and when does the list change the most?

I like to check the news on the New York Times site several times a day as a break from my research activitiy. Although I have often seen certain headlines throughout the day, there are usually a few articles on the most emailed list which I have not yet read. These are usually interesting articles and I was interested in trying to characterize what makes a popular article with this visualization.

This browser does not have a Java Plug-in.
Get the latest Java Plug-in here.

Observations

It is easy to see that the opinion and health articles are the most popular over this time period. It is also clear that the most changes happen during the morning hours of each day. This seems to point to online readers mainting the habit of reading the newspaper at the beginning of the day, even though it is on the computer. Also, only a few articles make it all the way to the top, but there are considerable changes at the bottom of the list.

Data

My data was exclusively from the New York Times page detailing the most emailed articles: http://www.nytimes.com/gst/mostemailed.html

I used the BeautifulSoup python library to parse the rank and article information every 15 minutes. I then used python to transpose the resulting time ordered ranks to a time ordered set for unique articles using the article link as the indentifier. This data is imported to Processing as a tab delmited file to plot the trajectories of each article.

To visualize the data, I selected a series of equally spaced hues to encode each section of the paper. The dots used for each time series are grow slightly over the increasing rank to subtly draw the eye to the top ranked articles. When a section is selected, articles in that section are highlighted to show the trends over the week.