Top Web sites represented by a Tree Map


Luigi Cardarelli  lcardarelli1@alice.it


Project Overview

My project is a visualization  of the Top Web Sites over the world sorted in categories and subcategories and represented by a Tree map visualization.   My aim was to give to the users a visualization of the top web sites by a Tree map in a minimum amount of space, allowing the user to get information quickly by interaction, so this visualization permits rapid comparisons among the web sites too. In fact there are a few web sites reporting the top list of the web sites but always in a static form such as a list, they show these data primarily in a static way.  
For instance, one of the main information reported by my visualization is the website's rank. This data is encoded by the size of the box representing a website, or a sub category or a category and then it quickly pops out to the user.
The target audience is both professionals worker in Internet field, and common people interested in this kind of information. Moreover, considering that the top list web sites is a social index showing the preferences, trends and tastes of people over the world, it is a useful tool for social researches. For instance,  I have discovered  that Facebook website is the second popular website in my country (Italy), this fact astonished myself and it's an indicator of how social contacts are very important in Italy.


Visualization Approach

Data Acquisition:   I picked up all my data from Internet, my web site source is http://www.alexa.com/topsites that reports in a static way the top web sites just split in categories  and subcategories.
For the acquiring data phase it is necessary to acquire data from Internet in an automatically way for both reasons that there are too much data to pick up them manually, and due to the live nature of the data that could change daily.
To scrape data I employed  python and Beautiful Soup Library.  The aim of the scraping is to obtain a data file in a suitable format to be a data base for the code implementing the visualization.
The acquisition of data requires time (about 2-3 hours) because it's necessary to open many web sites to take all necessary data.

Data Representation: As I have above said the visual model is a Tree Map implemented by Processing application.
The top list of web sites are sorted in 12 main categories, each one contains until to 50 sub categories and each one contains the first 20 web sites following the rank.

Features of the Visualization: The tree map shows categories, subcategories and web sites like rectangles separated by white lines highlighting the boundaries of the boxes. Colors are used to show both differences and affinity among different boxes. When the user by the mouse is over a category or a subcategory it highlight itself like if it were under a light spot.
As a result of the hierarchical nature of the data, the rectangles representing categories contains subcategories and these contain web sites. By the interaction the user can surf through the different levels, opening or closing contents, zooming in and out.
In the WebSitesTreeMap the rank is encoded by the rectangle size at each level. The rank is a measure of web popularity, for instance the Google web site has rank 1 and so it has the larger rectangle. The sum of the web site ranks make the size of the sub category rectangle that contains them. The same principle for the subcategories inside a category. 

For each web site you can get these information:
Moreover:   A world map linked to the tree map shows these data:
A colored circle is located on each of the first (generally five) countries with  more website users. By color and circle's size is encoded the quantitative data of the percentage of users. However, also a bar chart reports the percentage of users for each one of the countries represented in the world map.

Interaction:  The user can interact with the visualization in easy ways: Mouse's rollover, button's mouses and keys.
Here below a detailed description of the commands to interact with the visualization:

Here below four screen shots ( small size) to show these options:


First screen shot, just run the program, shows the twelve main categories. The user begins his trip choosing by the muse one of them.  Specifically in this case the user is rolling over Science category, the category is highlighted.  The next screen shot will show the Science category just opened by a press on the left button mouse.



Second screen shot shows all Science's subcategories . The user has just opened the contents of the sub category Physics. The next screen shot will show the Physics sub category just zoomed in by a press on the left button mouse on the left uper box.


Third screen shot shows all web sites belonging to the subcategory physics. Moreover, the user has just pressed the key 'i' and m' to turn on the additional options to get website's information and the world map with the first five countries having more users relative to the web site chosen by the user. Specifically in this case, the user is rolling over www.cern.ch web site, so the world map and the information are relative to this web site.

Applet

This browser does not have a Java Plug-in.
Get the latest Java Plug-in here.

Source code (by Luigi Cardarelli): WebSitesTreeMap BoundsIntegrator CategoryItem Node Table WebItem

Built with Processing




Downloads

To start the acquisition data from the websiteScraper folder you have to  type: python Webscraper.py > webdata. tsv
The file webdata.tsv is the database to build the visualization by Processing, therefore if you want update the acquisition data it's necessary to put the updated file webdata.tsv inside the folder data of WebSitesTreeMap.
Needed library: Ben Fry's treemap Library



Questions? Comments? Feedback? Write me