Homework 2

Due Date: Wednesday March 12 at noon EST


Overview

The goals of this homework are for you to:
  • explore the online visualization tools of Many-Eyes,
  • become more familiar with the Processing environment and program structure,
  • develop a map visualization tool for looking at geographic data, and
  • analyze different methods for visualizing data on a map.
 

Part 1: Exploring data with Many-Eyes (35 points)

For this part of the assignment you will be looking at several different datasets using the visualization tools developed by the folks at www.many-eyes.com. Some of the datasets will be already available on the site, and others you will upload yourself.

First, go to the Many-Eyes website and register. Registration is necessary for saving and sharing your visualizations, as well as for uploading data.

Second, using this dataset, explore the different visualization tools available. Pick one method that you think effectively visualizes some aspect of the dataset and save the visualization. Using the "share" feature in Many-Eyes, include this visualization in your write-up. In your write-up it should look something like this:

Answer the following questions for your visualization:

    1. What feature of the data are you visualizing?
    2. Why is this method effective?
    3. What other methods might be good for this dataset, and what different types of information do they portray?
    4. Can you suggest any improvements?

The previous dataset was compiled from the World Health Organization database. Using this database, generate another dataset that looks at statistics you find interesting. Comparing morbidity and mortality with access to health care or economic statistics may be a good place to look for interesting features in the data. Or, you can also select an interesting dataset from the Dataverse Network at the Institute for Quantitative Social Science at Harvard. Upload this dataset to Many-Eyes, and again explore different visualization methods. Pick one that you feel best depicts some feature in your dataset, place the "share" link in your write-up, and answer the following questions:

    1. What question are you asking with this data?
    2. What does the visualization tell you about that question?
    3. Why did you choose this visualization method?
    4. Can you suggest any improvements?

The next thing you will do with Many-Eyes is generate some sort of text (word) file and use the Tag Cloud and Word Tree to visualize the file. The easiest way to generate a simple text file is to either save a document from an editor as a .txt file. Or, you can cut and paste text from another source such as a website into a text editor, and again, save the document at a .txt. Some ideas for text files may be your homework 1 write-up, a compilation of all the emails you sent yesterday, a transcript of President Bush's most recent State of the Union address, etc. Remember that these datasets will be publicly available on the Many-Eyes site, so please consider the content before using personal or sensitive text.

Finally, you will look at the map visualization tool in Many-Eyes. NOTE: you may want to come back to this part after you complete the Part 2 of this assignment! Go to this visualization and answer the following questions:

    1. Which visualization method (area or bubbles) do you prefer and why?
    2. What do you like about this mapping tool?
    3. What improvements can you suggest?

 


Part 2: Fun with Maps (60 points)

In this part of the assignment you will build upon the map example in Chapter 3 of the Fry book to develop a mapping visualization tool. You will explore different techniques for displaying geographical information on a map, and also look at color spaces for representing data.

To begin, we have provided a zip file containing all of the example code from the book -- download the zip file here and place in your Processing sketchbook directory. Read carefully through Chapter 3 in the Fry book and work through the examples. You will need to understand these examples to complete this part of the assignment. Each sketch is named according to the subsection title in the chapter (e.g., "3.1 Drawing a Map" is Drawing_a_Map).

Next, go to the ColorBrewer website. This website was put together by Cindy Brewer at Penn State University as a tool for selecting color schemes for map visualizations. On the left side of the page you'll see several steps for generating color schemes --- read through the "learn more" details for each step and explore the options. You will be selecting color schemes from here for a later step in this part of the assignment.

2.1) Getting Started (15 points)

In this part of the assignment you will be extending and refining the examples in Chapter 3. Remember that you can easily go to the reference page for Processing functions by selecting the function name, right-clicking on the highlighted name, and choosing "Find in Reference" from the pop-up menu.

2.1.1) Map, Norm, and Lerp Functions

An important function in the map visualization tool is the (not surprisingly!) map() function. To better understand what this function does you will write your own map function in two different ways. Download the sketch Ex_2_1_1.pde --- you will see variables for the domain and range of the values you will be mapping from and to. The domain is the from set of values, and the range is what you will map to.

When you run this sketch you will see in the output window three values --- your goal is to keep these values the same! First, look for the comment // 2.1.1.a. Compute the value of v_prime using a formulaic expression of the map() function (i.e., write an expression using the variables a, b, c, d, and v). Second, look for the comment // 2.1.1.b. Compute the value v_prime using the lerp() and norm() functions.

In your write-up include a link to this sketch. Explain mathematically and descriptively what the lerp() and norm() functions do.

2.1.2) ColorBrewer and Squares

Open the "Data_on_a_Map_color" sketch that we provided, and save it as "Ex_2_1_2_a". Then, go to ColorBrewer and choose a sequential color chart that appeals to you. Get the hex values for the two end colors of the chart, and change the two colors in the lerpColor() function in the sketch to your two ColorBrewer colors.

Try the lerpColor() function in both the RGB and HSB color spaces. Add a comparison of these two color spaces in your write-up by doing the following: add noLoop() to the end of the setup function so that the draw() is only called once); use saveFrame("RGB.png") and saveFrame("HSB.png") at the end of the draw function to store the results in images; places these images along with your Ex_2_1_2_a applet in your write-up. Describe what difference you observe. Which color space is better for the interpolation? Why? Try changing the "Number of Classes" in the ColorBrewer tool and see how your results differ. What is the effect of this on your visualization?

Again, open the "Data_on_a_Map_color" sketch that we provided, and this time save it as "Ex_2_1_2_b". Replace the data circles by squares. The center of the squares should coincide with the centers of the circles, and the width of the square should be the same as the diameter of the circle (hint: look for rect()).

Include the applet for Ex_2_1_2_b in your write-up. Do you prefer the circle or square data representation? Why? What are some examples of data that might be good for each of these shapes?

2.1.3) Refine

Open the "Data_on_a_Map_interpolation" sketch that we provided, and save it as "Ex_2_1_3". Change the data range to [-10000, 10000] by modifying the dataMin and dataMax variables, and adding the updateTable() function to the end of setup(). When you run the modified sketch, you'll notice that the state name + value is quite long. First, modify the sketch such that the state abbreviation is displayed instead of the entire name (i.e., MA instead of Massachusetts). And second, for such a large data range, the values after the decimal point are less meaningful in the context of what we can visually discern. So, change the data range to integers (hint: use an (int) cast), and use the nfc() function to make the displayed numbers more readable.

Include the sketch in your write-up. Can you think of other techniques you could use to modify how the data values are displayed? When is it ok, and not ok, to change the data representation as we did in this sketch?

2.2) Acquiring and Visualizing Population Data (45 points)

For this part of the assignment you will be acquiring a new map and population data, and then exploring several different techniques for visualizing the data. You can choose any country (other than the US) or US state (perhaps your home country/state). You will also be using some of the functionality you included in the previous homework assignment.

2.2.1) Defining Locations on the Map

Open the "Using_Your_Own_Data" sketch and save it as "Ex_2_2_1". Next, find a blank map of your country/state and replace the file "map.png" in the data folder of the sketch directory. A good source of blank maps is Wikipedia. You can also do a Google image search. Open the "names.tsv" file in the data folder of the sketch, and replace the state shortcuts and names by those of the states/counties in your country/state (use a tab between column elements). Then, replace the width and height variables in the size() function to correspond to the width and height of your map.

When you run the sketch you will be prompted to select the locations of all of the states/counties on your map --- see "Taking Data from the User" section of Chapter 3 for a description of this sketch. The result will be a file "locations.tsv" in your sketch folder that correlates pixel locations on your map with the state/county names and data.

2.2.2) Showing Population

Open the "Data_on_a_Map_interact" sketch and save it as "Ex_2_2_2". Add your country/state map to the sketch by dragging and dropping, and also include your "names.tsv" and "locations.tsv" files. Next, find information about the population in your country/state and store it in a file "population.tsv" (see "random.tsv" in the data folder for an example).

Population data is only positive, so you will need to modify the drawData() function to reflect this. Play with the range of the radii sizes such that they make sense for your data. Choose a color that you like. Also, apply the techniques in 2.1.3 to make the big numbers look better. When you rollover a state it should display the text in the following format: "Massachusetts 1,345,122".

Include you applet in the write-up. Why did you pick this country? Did anything surprise you about the population data?

Below is a screenshot of an example of this part of this exercise.


2.2.3) Population Density and Tabs

Take your sketch from the last part "Ex_2_2_2" and save it as "Ex_2_2_3". Find the area of every state/county in your country/state and store it in the same format as "population.tsv" in a file "area.tsv" --- add this file to your new sketch. In the previous homework assignment you learned how to use tabs to include multiple datasets in a visualizations. You will apply that technique to look at the population and population density of your country/state.

Add the following functions to your sketch:

      1. setTab(int tab): sets the current tab (i.e., setCurrent(int col) in homework 1).
      2. mousePressed(): to change between tabs
      3. drawTitleTabs(): to draw title tabs
You may also need to change (add) some variables, such as PFont font = load("") into PFont font = load("", ) so that the font scales correctly.

Now, display the population map in a tab with the title "Population".

Next, you will add a second tab to display the population density. First, load the area data using the Table class in the setup function (hint: look at the code to load the population.tsv file). Second, compute the min and max population as before but now store the results in dataMin[0] and dataMax[0], and then compute the min and max population density and store them in dataMin[1] and dataMax[1]. (hint: population density is defined as population/area). Third, change drawData() so that it shows Population in the first tab and Population Density in the second tab. The format of the rollover text should be: "Massachusetts 0.15" (hint: use nf()). Finally, add another tab with the title "Population Density (per )".

Include the applet in your write-up. Did anything surprise when comparing the population with the population density?

Below is a screenshot of an example of this part of this exercise.


2.2.4) Tabs of Past or Future Population

Take your sketch from the previous part "Ex_2_2_3" and save it as "Ex_2_2_4". Find several years worth of future or past population data for your country/state and overwrite the "population.tsv" file. You can find an example file here. To read in the multiyear population data, use the FloatTable.pde that was used in the previous homework assignment, and load the population data.

Now, to compare the population data over the years, you need to first compute a table dataMin and table dataMax. Next, add a tab for every year in the dataset and display the corresponding population data. Use the range [dataMin, dataMax], so that it's possible to interpret the steps. Finally, incorporate the Integrator class into your sketch, and modify the damping and attraction parameters to fit your visualization best.

Include the applet in your write-up.

Below is a screenshot of an example of this part of this exercise.


2.2.5) Size, Color and Transparency

So far you have only looked at changing the size of the circles to represent the data values. Before making a final design decision, you will also try color and transparency encoding. Take your sketch from the previous part "Ex_2_2_4" and save it as "Ex_2_2_5". Modify the sketch so that a color range reflects the data values instead of the size of the circles. To do this, use the lerpColor() function and choose two similar colors to interpolate in the HSB color space. Next, add code to so that the value is mapped to a transparency value. The lower the value, the more transparent the circle. To compare the different encodings, add a function pressedKey() so that you can switch between the different encodings with the "[" and "]" keys.

Which encoding method (size, color, transparency) do you think is best for visualizing the population data? Why? Give an example of where you think each type of encoding would be useful.


 

Part 3: The Good, the Bad, and the Ugly (5 points)

In 1-2 paragraphs, describe what you found most useful about the assignment, and how you might apply these ideas to a question in your field. What are some ways you might extend your map sketches to generate more effective visualizations? Also, let us know what you found most frustrating, and how you think the assignment could be improved. Lastly, indicate roughly how many hours it took for you to complete this assignment. This is so that we can gauge whether we're making the assignments too long, too short, or just right.  

Extra Credit: How much money do you have to raise to become a US President? (20 points)

Go to the following website. It shows three tables with the presidential candidates and information about how much money they have raised so far.

For this part you will first parse this website using the techniques in the Fry book (Chapter 5), and second you will display the information on a map (i.e., the USA map from the book). The following fields are of interest: The candidates name, his\her home state, party (D for democrats, R for republicans), and the total raised money. Include a link to your parsing code.

You can also copy and paste the data into a table by hand without writing the parsing code. You will get only half of the extra credit points for this approach.

There are several challenges to solve, such as: NY is the home state of two candidates. Your visualization should show both data points for NY, one for Clinton and one for Guiliani; not every state is a home state of a candidate.

We would like visualize two dimensions --- the total amount of raised money and the candidates' parties. When you rollover a home state with a candidate it should display the candidate's name and the total amount of money he/she raised so far. (e.g. Obama Barack $113,291,435). To display both dimensions use circles with varying radii (money) for Democrats and squares with changing side length for Republican candidates. Encode the amount of raised money not only with size but also with a nice sequential color chart from ColorBrewer (choose the number of classes to be the maximum of 9). Divide the interval [dataMin, dataMax] into 9 equally sized intervals, after which every interval has a corresponding color chart color. Color the shape accordingly.
 

 


Submission Instructions

To submit your homework, create a folder named lastname_firstinitial_hw2, and place your write-up, Many-Eyes links, applets, and Processing sketches (which should also be linked from the write-up) into the directory - please make sure that all of the links (except for the Many-Eyes links) in your write-up are relative to this folder! Compress the folder and send it as an email attachment to miriah@seas.harvard.edu.