CS 171: Homework 4 Write-up
- Part 1: The Place
- Part 2: Data Collection
- Part 3: Registering the Locations to your Map
- Part 4: Tell Your Story
Part 1: The Place
The indoor space I chose to visualize is my dorm, Mather House. I chose the fourth floor in particular because it was a space I was especially familiar with; I spend most of my time on the floor and know a little bit about everything that happens there. I also chose the space because, to my knowledge, it was a space that hasn't really been visualized before—I wanted a way to somehow visualize the activity that happens there, especially since the Mather community is known to be one of the most vibrant and wildest on campus.
In particular, I wanted to visualize student interaction, and I thought that the best way to do this was to examine the Mather-open, our house mailing list. The Mather-open overflows my inbox with all sorts of messages every day, and I wanted to know a little more about who sends what over the list. There are many interesting questions one can explore with data about the mailing list; the specific issues I sought to answer are listed below, and these are what will form the "story" I will be sharing:
- Which suites are the most active contributors to the Mather-open. I wanted to know if it was true that students in the corner suites, which are bigger and often called the "party suites," were actually much more active in the community.
- Whether contribution to the mailing list had any relation to the "geographical" location of the suites students lived in. (Aside the point from the corner suites, this might normally not make much sense, but because juniors and seniors definitely picked their own rooms, we might see rooms that are next to each other that are contributing in similar ways.)
- Which months saw the most action over the list this year.
- What topics students were talking about the most over the Mather-open.
- Which gender on our floor sends out more e-mails.
In seeking to answer these questions, I made a couple of assumptions. First, I decided to examine contribution to the Mather-open by suite instead of student; even if only one person in room X was sending e-mails to the list, it will appear as contributions from room X, not the individual. Second, e-mails will be associated to physical location; even if e-mails over the Mather-open are not actually physically sent from a room in the suite, it will still count towards it. (I understand that the association with e-mails to the physical location from which it was sense is a bit unconventional, but this makes the visualization more interesting.)
Part 2: Data Collection
First off, I found a print-out of Mather's floorplan, and I traced this using Adobe Illustrator. I simplified it so that only the essential elements remained, though I deliberately left some of the finer details to give it a more authentic map-like feel to it (opposed to a boring series of boxes to represent rooms).
As abovementioned, the "locations within the space" I have chosen are the suites (the rooms) on the 4th floor. I've chosen a certain corner of the 4th floor, so I've collected data for 17 different rooms.
The data I wanted to collect were the following:
- Who lived on the fourth floor, what their e-mail addresses are, and the suite they live in (along with the gender of the students in the suite). [Nominal data]
- For each individual in a suite, how many threads in Mather-open he/she contributed to in January, February, March, and April of 2009. (This is to be totaled by suite.) [Quantitative data in four Ordinal sets (for months)]
- The topic (the subject) of the most popular threads of each month the above students contributed to (and how many e-mails it contained). [Nominal and Quantitative data, though later will be Ordinal because these will be "ranked"]
- Photographs of each room
Since I don't delete any posts on the Mather-open, they are all in my GMail inbox. I found the housing-related information in our handbook, and what I essentially did was run a number of search queries on my inbox for Mather-open messages from a particular month which a particular e-mail address contributed to. By the end of these queries, I would know how many e-mail messages Student A contributed to in January, February, March, and April. I then proceeded to sum these numbers together according to suite. I found the topic of the most popular threads of each month by just searching for this in my inbox over particular time intervals. I took the photographs of the rooms myself, though to respect privacy, I chose artistic angles that don't reveal the identities of the students in the suite.
In retrospect, the way in which I collected this data was rather foolish, since almost all of it would be automated. If I were to do this over again, I would use a Python script that can retrieve GMail messages from my inbox; I could very easily automate the search queries for the messages and let my program do the counting for me. (I am actually planning to do something similar for my Final Project.) At the time, however, I was also simultaneously playing around with different searches to see what would yield the most interesting results; automation requires knowing exactly what is wanted.
Part 3: Registering the Locations to your Map
The output file that resulted from registering my data locations with places on the map can be found here. Note that the first column is the room number, followed by the x and y coordinates.
Part 4: Tell Your Story
The visualization is seen below (seems to not work in IE, so please try FireFox):
(Note that Rooms 437 and 434 never have any data because they are specially-designated tutor suites.)
Design Principles
The first issue I had to address when brainstorming this visualization was determining a way to effectively encode the data I've collected on a map. Because I was limited to the use of position (at least of some sort, since it was a map) for encoding, I had to think of a good way to represent the data I had for each "point" at the very location of the point. Though I thought about bars or even words (where, like the tag cloud, the size of the word informs the viewer of the number of e-mails sent from the room, for example), I had to cast these ideas away because they would be difficult to position at a particular coordinate on the map. I also thought of using a series of points that would each represent an individual e-mail. Yet, this was a problem since the points would be too small to distinguish, and even if they're made bigger, it's not an easy task trying to lay them out in each room in such a way that they would have a collective effect that would allow for some sort of interpretation of the data. A circle (which is part of a matrix plot-like graph), which I eventually chose, would be much easier, since it grows outwards from the coordinate points and doesn't look awkward even if it doesn't "fit" in a room (opopsed to a square, where the similarly square rooms would be at odds with it). I realized that these circles would make interaction and comparison very easy, since it could easily grow in size and be different colors. (I will discuss some of the perceptual specifics of the encoding in the next section.)
One of the most powerful things about this visualization is the ability and ease for comparison of a variety of data. First of all, there are the months on the top that allow for easy filtering of data (by time). Viewers can easily look at any combination of months which allow for quite a number of distinct comparisons to be made. (Important to see, for example, when you want to know whether vacations in January and march affected the total numbers in any way.) Second, one can compare the size of each circle based on explicit value (because when one hovers over a circle, the number of corresponding e-mails is shown below the picture), and one can also compare the circles based on relative range. The former is achieved through highlighting, while the latter is achieved through linking and brushing. One can immediately see similar-sized circles, informing the viewer of the suites that contribute at approximately the same level. In this way, this visualization reveals the data at several layers of details. It's also worth noting that the ranges (which also change dynamically according to which months are checked off) I've chosen are categorized into a series of playful descriptors: "Motor mouths," "Heavy hitters," "Nothin' special," and "Dead silent." This playfulness is also an important part of the design, since it allows for the statistical data to be closely integrated with the verbal description of the data. (After all, one of the overarching goals of this visualization is to understand the dynamics of the Mather community and the nature of the student interactions within it.)
There really are no extraneous elements in the graph, as I chose to use hardly any color outside of the circles; this contributes to reducing clutter and saving on the Data-Ink ratio. The fact that this matrix plot is presented on a map contributes to the aesthetics, which is an important part of the subjective dimensions of the visualization. The visualization, a narrative of space and time, is thus much more compelling than an average matrix plot.
Perceptual Principles
In terms of the visual encoding I used, I took care to stick to good perceptual principles where possible. The nominal data of the suites (the room number) is encoded with position (which is good according to Mackinlay, but also an obvious choice since we are using a map). We encoded the nominal data of gender using color (hue), which is the second best way to encode such data. The quantitative data of the number of contributions (for a particular suite, given a particular time interval) was encoded using the area of a circle. This isn't that ideal, since we know that human beings have difficulty comparing the sizes of circles. However, to take care of this problem, I have used the legend and linking to make explicit the approximate number of e-mails for each circle(though, in doing so, I've converted the data to ordinal data). In addition, I've written out the exact number of e-mails under the photograph of each room so that comparisons at more precise levels can be made. For me, the aesthetic effect of the circles was also a factor for choosing it over mappings that might have been more favorable.
It is true that one of the weaknesses of this design is the fact that very small data cannot be seen. Even with all months checked, rooms like 402 and 405 have barely visible data points which are hard to hover over. I entertained ideas to take care of this such as zooming or other scale factors, but ultimately, these all made overall comparisons and impressions much more difficult to make. Since my task was explicitly about comparing different contributions, I decided to sacrifice the ease of hovering for easier, accurate comparison. (And in terms of design principles, I wanted to observe graphical integrity where possible.)
As for color, I, as usual, used calm colors that weren't too demanding on the eyes, but nonetheless bold enough to be visible even if only little of it is used. I chose to make the color of the rooms grey to signify the fact that it is this that the visualization is focused on. The only really colors I picked otherwise were for the colors of each circle, and this was determined quite easily by the fact that pink and blue are gender-specific in most cultures. In addition, the yellow highlight creates the "pop out" effect (as per R1 of the "Color Rules") that would be desired on a mouseover.
The Story
There are a couple of things that this visualization informs us. First, it is the case that the corner suites are more active on the Mather-open. While, admittedly, more students occupy these suites, the degree to which it dominates the other rooms is really quite remarkable. (Other correlations between location and contribution, however, are absent.) Second, it seems to be that the average male suite contributes more than its female counterpart, though there are arguably not enough of either to make a real conclusion. Third, it seems that January and March saw the most posts. In particular, January had one single e-mail titled "quiet, please?" that alone probably affected the numbers of contributions from each suite. March was the month of Housing Day, which explains all the related e-mails that were sent out during that time. Finally, it is generally the case that a suite maintains the same range of contribution over time; the suites all seem to contribute proportionally relative number of e-mails.