Homework 5: Multidimensional Data
Due date: Thursday, March 11th at 11:59pm EST


Data collection has become much easier today than ever, but it remains a challenge to understand the mounds of data whether it be economic, military, or political data. While finding a relationship between two parameters (if any) is fairly straight forward, the task becomes much more difficult with multiple parameters and more data. Finding any direct or indirect relationships in multidimensional data is especially important today as a plethora of useful data exists if explored with the correct tools.

In this assignment you will learn to use Tableau, a tool for analyzing and visualizing multidimensional data sets, as well as create your own exploratory tool using Processing. The data you will be looking at is from the class survey from HW 0. Your goal is to find any patterns that exist among your fellow classmates.

If you're interested, visit here for a recent relevant article about an overload of data.



Grading

The homework will be graded according to the guidelines from the syllabus:


5 = Exceptional / above and beyond (we will only give out maybe 5-10 of these for each homework)
4 = Very Solid / no mistakes (or really minor)
3 = Good / some mistakes
2 = Fair / some major conceptual errors
1 = Poor / did not finish?
0 = Did not participate / did not hand in

A 4 constitutes a perfect grade, and a 4 is equivalent to an A. A combination of 4s and 3s end up being A- to B, and so on. TFs will evaluate your work holistically beyond mechanical correctness and focus on the overall quality of the work. In addition to the scores the TFs will give detailed written feedback.



Part 1: Exploration with Tableau (40%)

You will first need to install Tableau, and you can find installation instructions here.

Next, download the CS 171 Class Survey excel spreadsheet. You can see an updated version of the data in this forum post, which eliminates some inconsistencies in the data.  Start up Tableau and open this spreadsheet. Spend some time with the data and the features of Tableau. If you need some help, you can find a wealth of information on the Tableau website.

Once you feel comfortable with the Tableau interface, create visualizations to answer the following questions:

  1. What kind of computers do younger programmers use?
  2. What operating systems do more comfortable programmers use?
  3. What is the relationship between how long one has been programming and the frequency of coding?
  4. What is the relationship between concentration and primary programming language, and how does it change over time (2009 / 2010)?
  5. What is the relationship between age, programming language, and programming experience, and how does it change over time (2009 / 2010)?

In your write-up, include the answers to these questions along with screen shots of the visualizations you created to answer them.


Part 2: Designing an Interaction Using Classmate Data (60%)

Based on your exploration of the class data, choose a task that you would like to address with an interactive visualization. This task should require three dimensions and/or measures that you think have an interesting correlation -- we will call these parameters. At least one of these parameters must be nominal, and at least one must be ordinal. You will design and implement an interactive visualization to explore these three parameters to (hopefully!) reveal an insight about your fellow classmates. In your write-up, describe the task you are going to address, and create a list of the three chosen parameters along with their respective data types.

Your interaction will use three types of encodings: spatial, color, and size. One of the parameters you will map to spatial location (ie. the x or y axis); one of the parameters you will map to a color palette; and the third parameter you will map to size. Include in your list of parameters which type of data encoding you will use for each. Remember that certain types of data encodings are better for certain types of data! For the color and size and encoding, include a legend in your design to specify what these values mean. Also, every data point should be visible in your design.

You will next include interaction in the form of mousing over and linked highlighting. Mousing over a data point should reveal some additional information about the student, such as their interesting factoid and home town. Mousing over data points should also link the data to the color/size legends, as well as to other related data points. Linking these views should allow the user to quickly determine what values are encoded for the data point via color and size and to also help the user located other data points that are similar.

Design your visualization with principles of excellence in mind. When choosing colors you may refer to the Color Brewer scales. In your write-up, include an concise explanation of your design process and visual encoding decisions, and the perceptual and design principles you followed.

Helpful Information and Recommended Reading for Processing (though you may use any language you like)
  • Shiffman book: 17.1 - 17.4 on Strings and fonts in Processing
  • 18.1 - 18.3 on inputting text files
  • 5.5 and 5.6 on mouse rollovers and buttons
  • Chapters 3, 7, 8, 9 for more on programming basics.

Download the class survey in CSV format. Each element in a row is separated by a comma, and each row is separated by a carriage return.

To read in this data, you’ll need just a few basic commands. First, you can read in a file in Processing using this command:
String[] rows = loadStrings(file);
This command takes a file, and loads the rows of the file into an array of Strings called rows, where a row is defined by a list of characters up to a carriage return (ie. a row in the class survey data file).

You’ll need to parse the elements in each row as the list of characters up to each comma (ie. an entry in the class survey data file). Use this command:
String[] cols = split(rows[i], ',');
which puts each set of characters up to a comma into an array called cols. You can then ask for different elements in the cols array for the parameters you choose to visualize.

It will also be useful to use:
rows.length
to determine the number of data elements in the data set. Remember that the first row contains the names of the parameters.

At this point all of your data will be stored as Strings. You could (if you choose) categorize these strings like:
if ( cols[2].equals( “Mac OS” ) ) {
data_point[i].operating_system = 0;
}
else if( cols[2].equals( “Windows XP” ) ) {
data_point[i].operating_system = 1;
}
...

Some of the data is sequential by nature, and you’ll need to assign an order to the categories in your code.

As long as you implement the design requirements discussed, you are welcome to build on this design in any way you would like. In your write-up include an applet of your visualization, and discuss what insight(s) you were able to determine about your classmates.



Submission Instructions

Your write-up will be as a webpage -- you can use any webpage layout your wish, or, cut and paste the html source from this page and plug in your work. To submit your homework, create a folder named lastname_firstinitial_hw5 and place your webpage write-up and your Processing sketch in this folder -- please make sure that all of the links in your write-up are relative to this folder! Compress the folder (please use .zip compression) and submit on the course iSite page in the HW 5 dropbox.