Misc. Information

Part 4a: Data Types

a) 2 = nominal, 4 = quantitative

b) 9 = ordinal, 3 = quantitative

c) 4 = quantitative, 6 = nominal, 7 = nominal

d) 4 = quantitative, 8 = nominal

e) 5 = quantitative, 9 = ordinal, 1 = nominal

Part 4b: Visual Encodings and Visualization Types

a) Visualization choice: Block histogram (one for each operating system). The years of programming experience quantitative data type would be partitioned into bins representing experience levels (e.g. less than 1 year, between 1 and 2 years, etc.), and the number of people with a particular experience level who use the given operating system would be encoded as the height of the histogram bar in the correspond bin in the corresponding histogram for that operating system. Thus the operating system nominal data type would be encoded as different histograms. We could then look at the different histograms to infer the trends. For example, if the histogram for linux is right-skewed, meaning the histogram bars are higher towards to the right side of the histogram, then we could infer that the group of people in this study who use linux are primarily more experienced programmers.

b) Visualization choice: Scatter plot. The x-axis would encode the ordinal scale of "programming comfort level" while the y-axis would encode the resolution of monitors quantitative data type (in square pixels). For each subject in the study, we would plot the point (x,y) where x is that user's programming comfort level and y is that user's monitor size. To find out if there is a trend, we could approximate a line-of-best-fit just by looking at the data (or we could actually run a linear regression and find the r- and p-values). If there is a clear trend, it will be apparent by looking at the scatter plot.

c) Visualization choice: Bar chart. We could define "experienced" as having more than two years of experience. Then we could count the number of appearances of programming languages in the lists of primary and other languages used by "experienced" programmers in our data set. The x-axis of the bar chart would have the nominal data type of programming language names, while the y-axis would have the quantitative data type consisting of the number of times each programming language is mentioned. We could look at the height of the bars in the graph, and find the tallest bars to determine the languages most commonly used by experienced programmers. We could recreate this visualization with different cut-off levels for defining "experienced" programmers to possibly achieve different results.

d) Visualization choice: Bar chart. The encoding would be very similar to part c). We would count the number of appearances of classes in the lists of classes taken by programmers in our data set with less than a year of experience. The x-axis of the bar chart would have the nominal data type of class names, while the y-axis would have the quantitative data type consisting of the number of times each class is mentioned. To answer the posed question, we can simply look at the height of the "CS 50" bar to find out how many students with less than a year programming experience took CS 50.

e) Visualization choice: Scatter plot. The ordinal comfort level would be encoded on the x-axis while the quantitative data type consisting of how often students program would be encoded on the y-axis. For each student with comfort level x and that programs at a rate of y, we would plot a point (x,y) if and only if that student owns a laptop. To answer the posed question, we just need to determine whether there are a lot of points in the upper right region of the scatterplot, which is the region consisting of students who are the most comfortable with programming and program the most often.