Homework 1

Due Date: Wednesday February 20 at noon EST


You can find the master solution on the Google Groups page under the Files listing.


Overview

The goals of this homework are for you to:
  • critically look at everyday visualizations with good design principles in mind,
  • become familiar with the Processing environment and program structure,
  • explore techniques for visualizing and interacting with 1D time series data, and
  • filter noisy data using convolution.
 

Part 0: Joining the class mailing list (5 points)

Go to the class website and follow the directions for signing up to the mailing list. Please subscribe before class on Wednesday, February 6th. Next, send an email out to the list introducing yourself sometime after class on the February 6th, and before the homework deadline. In your introduction you should include your academic and professional background, why you are taking this class and what you are hoping to get out of it, and some interesting factoid about yourself (e.g., you can fly a plane, you volunteer at a shelter in your spare time, you play drums in a band, etc).


 


Part 1: Vis Critique (25 points)

For this part of the assignment you will find and analyze an example of a good visualization of quantitative data, where the definition of good comes from the design principles covered in class. The example should include several variables, but does not need to be overly complex. You may use print media (textbooks, magazines, newspapers, etc), or images from the web. Ideally the visualization will address a question that you find personally interesting.

The write-up will include an image of the visualization and an analysis of its design - make sure to include where you found it. Print media may be digitized using a scanner or a digital camera; for web images, include the image as well as a link to the original source. Your analysis should address these questions with one short paragraph each:

    1. Who is the intended audience?
    2. What information does this visualization represent?
    3. How many data dimensions does it encode?
    4. List several tasks, comparisons or evaluations it enables
    5. What principles of excellence best describe why it is good / bad?
    6. Can you suggest any improvements?
    7. Why do you like / dislike this visualization?

 

Part 2: Visualizing 1D Time Series Data with Processing (65 points total)

For this part of the assignment you are going to explore visualizing multiple 1D time series data sets. In the first part you will look at different visualization and interaction techniques, and in the second part you are going to generate several noisy data sets and two different 1D convolution kernels for filtering the data.

Exploring the Time Series Example in Chapter 4

Download a sketch of the milk-tea-coffee example from Chapter 4. You will find the example sketch (timeseries_sketch.pde), two additional classes ( FloatTable.pde and Integrator.pde), and the data (data/milk-tea-coffee.tsv). Each of the visualization and interaction techniques discussed in Chapter 4 are included somewhere in the sketch. Read carefully through this chapter, and follow along in the example code - make sure you explore the different methods as this sketch will be the framework you use to visualize data in the next few steps of the assignment.

2a) Modify the Sketch (20 points)

You will modify the sketch based on the description and code in the book to do the following (it should look like the plot shown below):

      1. Change the x-axis interval to a 10 year increment.
      2. Make the smooth transition with interpolation between the tabbed plots four times faster. HINT: look at the instantiation of the Integrator class in timeseries_sketch.pde.
      3. Draw the x-axis grid lines like those in Figure 4-13.
      4. Add a mouse rollover for highlighting data values, like that in Figure 4-10.

For the write-up, export your sketch, and include the applet and link to the source code.


 

2b) Generating Time Series Data (20 points)

Download the SeriesGenerator.pde class for creating your own 1D time series data. This class is a modified version of FloatTable.pde and generates the data array instead of reading in from a file. You will modify the function float[] generateSeries(int i) such that each case in the switch statement generates one of the following series with 100 values each:
      1. A series of random values in the range [0, 1000].
      2. A sine series in the range [0, 2*PI] with added random noise in the range [0, 0.2].
      3. A step function of values 0 and 1, where the step occurs at the 50th value, with added random noise in the range [0, 0.2].

You can copy the timeseries_sketch.pde into a new sketch named my_timeseries_sketch.pde, and incorporate the SeriesGenerator.pde by replacing the FloatTable variable types with SeriesGenerator.

For generating visualizations, you will need to modify two things in the sketch. First, the y-axis should scale automatically as you tab through the different series. And second, the labeling of the y-axis in the drawVolumeLabels() (and the relevant variables) must also change to reflect the different scales.

Generate a tabbed visualization similar to 2a), but with the three new series. You do not have to use an area plot. Summarize the visualization and interaction techniques you use for the final result, and why you made these design decisions.

2c) Filtering Time Series Data (25 points)

Obviously, the data is noisy. To filter these time series, you are going to incorporate two different 1D convolution kernels. The first is a linear filter with a support of one data point on either side of the evaluation point, with weights of [0.25, 0.5, 0.25]. The second kernel is a Gaussian filter with support of two data points on either side of the evaluation point, with weights of [0.0103, 0.2076, 0.5642, 0.2076, 0.0103].

To incorporate these kernels into your sketch, you will fill-in the functions linearFilter(float[] data) and gaussianFilter(float[] data) in the SeriesGenerator class. These functions will take a data array, perform convolution over the array, and return the filtered data - use of these filters should be incorporated into the generateSeries(int i) function after a series array is created. Boundary conditions should be handled by bleeding values at the boundary, meaning, when part of a kernel falls outside of the data range, the value of the last data point covered by the kernel should be used in place of the absent data values.

For exploring the use of filters, you will generate plots of the unfiltered time series data as well as the data filtered with both kernels. That means you will have a total of nine tabs in your sketch (series 1, filtered series 1 [linear], filtered series 1 [Gaussian], series 2, etc.). To do this, change the columnCount variable to 9, and add in the appropriate case statements to generateSeries(int i). You will also want to try out the different plotting techniques described in the book for determining the best way to visualization the data.

We realize that having nine tabs for the three series, original and filtered, and the two kernels, linear and Gaussian, is not elegant. Feel free to explore how to change the code so you can display more than one series in a window. You may use keystrokes to switch between kernels. Note that this will require a fundamental change to the sketch design, so we do not require it.

Export your sketch and incorporate the applet and linked code into your write-up. Summarize the visualization and interaction techniques you use for the final result, and why you made these design decisions. Also in your write-up, address the following questions:

      1. What are the differences between the results of the two filters?
      2. When would you want to use a filter? When wouldn't you?
      3. Did anything surprise you about the results?
       

Part 3: The Good, the Bad, and the Ugly (5 points)

In 1-2 paragraphs, describe what you found most useful about the assignment, and how you might apply these ideas to a question in your field. What are some ways you might extend your time series sketches to generate more effective visualizations? Also, let us know what you found most frustrating, and how you think the assignment could be improved. Lastly, indicate roughly how many hours it took for you to complete this assignment. This is so that we can gauge whether we're making the assignments too long, too short, or just right.

Extra Credit: Visualizing and Filtering Sound Files (20 points)

A common 1D time series data set is a sound file such as any one of the hundreds of songs sitting on your computer or in your iPod. We can use the tools developed in this assignment to explore the contents of sounds files in a visual way.

You can read in a sound file using the ESS library for Processing. Download the Ess_r2 library, and drag the Ess directory into the Processing library directory. To check that the library installed, quit and restart Processing, and run an example from the Examples/Libraries/Ess/.

You should create a copy of your my_timeseries_sketch.pde sketch, and call it my_sound_sketch.pde. At the very top of my_sound_sketch.pde, include the line import krister.Ess.*;. In the setup() function, right after setting the size of the window, include the line Ess.start(this);. These functions include the library into your sketch.

We have posted three sound files for you to use during for this assignment, however, feel free to incorporate others. Unzip this file, and then drag-and-drop each .wav file into your Processing sketch window, which will automatically generate a data folder in your sketch directory. To read in a .wav file, the following code should be incorporated into the generateSeries(int i) function of the SeriesGenerator class:

AudioChannel myChannel = new AudioChannel("flute-clean.wav");
for(int j = 0; j < 100; j++)//myChannel.size; i++)
{ series[j] = myChannel.samples[j]; }

You should plot all three .wav files, and modify my_sound_sketch.pde to correctly scale the x-axis for files of different lengths. You should also play the sound file that is currently displayed in the viewer - you can find the commands for playing sounds files in the Ess example sketches.

Next, you will filter the flute-noisy.wav using the filter.wav file. Usually sound filters are not Gaussian, but rather sinc functions as they are better for minimize ringing effects. Thus, the filter you use to get rid of the noise is f(t) = sinc( t/3), where t = [-32 -31 ... 31 32]. This kernel is encoded into filter.wav.

Generate a new convolution function in SeriesGenerator where the number and weights of neighboring points used in the convolution is determined by the length and content of the filter.wav file. Again, bleed boundary values for kernel evaluations that occur outside of the data range. Apply this filter to flute-noisy.wav and plot and play this filtered series as another tab in your sketch. Again, we encourage you to think about ways to visualize the original and filtered sound in the same tab (see 2c).

In your write-up, include a brief overview of the design of your sound file visualization along with an applet of your sketch and all relevant code and files. Discuss any insights your visualization provided, and the effects the filtering had on the flute-noisy.wav file.
 

 


Submission Instructions

To submit your homework, create a folder named lastname_firstinitial_hw1, and place your write-up, visualization image, applets, and Processing sketches (which should also be linked from the write-up) into the directory - please make sure that all of the links in your write-up are relative to this folder! Compress the folder and send it as an email attachment to miriah@seas.harvard.edu.