My
project is a
visualization of the Top Web Sites over the world sorted in
categories
and subcategories and represented by a Tree map
visualization. My aim was to give to the users a
visualization of the top web sites by a Tree map in a minimum amount of
space, allowing the user to get information quickly by interaction, so
this visualization permits rapid comparisons among the web sites too.
In fact there are a few web sites reporting the top list of the web
sites but always in a static form such as a list, they show these data
primarily in a static way.
For instance, one of the main information reported by my visualization
is the website's rank. This data is encoded by the size of the box
representing a website, or a sub category or a category and then
it quickly pops out to the user.
The target audience is both
professionals worker in Internet field, and common people interested in
this kind of information. Moreover, considering that the top list web
sites is a social index showing the preferences,
trends and tastes of people over the world, it is a useful tool for
social researches. For instance, I have discovered that
Facebook
website is the
second popular website in my country (Italy), this fact astonished
myself and it's an indicator of how social contacts are very important
in Italy.
Data
Acquisition:
I picked up all my data from Internet, my web site source is
http://www.alexa.com/topsites
that reports in a static way the top web sites just split in
categories and subcategories.
For the acquiring data phase it is necessary to acquire data from
Internet in an automatically way for both reasons that there are too
much data to pick up them manually, and due to the live nature of the
data that could change daily.
To scrape data I employed python and Beautiful Soup
Library. The aim of the scraping is to obtain a data file in
a
suitable format to be a data base for the code implementing the
visualization.
The acquisition of data requires time (about 2-3 hours) because it's
necessary to open many web sites to take all necessary data.
Data
Representation:
As I
have above said the visual model is a Tree Map implemented by
Processing application.
The top list of web sites are sorted in 12 main categories, each one
contains until to 50 sub categories and each one contains the first
20 web sites following the rank.
Features of the
Visualization:
The tree map shows categories, subcategories and web sites like
rectangles separated by white lines highlighting the boundaries of the
boxes. Colors are used to show both differences and affinity among
different boxes. When the user by the mouse is over a category or a
subcategory it highlight itself like if it were under a light spot.
As a result of the hierarchical nature of the data, the rectangles
representing categories contains subcategories and these contain web
sites. By the interaction the user can surf through the different
levels, opening or closing contents, zooming in and out.
In the WebSitesTreeMap the rank is encoded by the rectangle size at
each level. The rank is a measure of web popularity, for instance the
Google web site has rank 1 and so it has the larger rectangle. The sum
of the web site ranks make the size of the sub category rectangle that
contains them. The same principle for the subcategories inside
a category.
For each web site you can get these information:
- Traffic Rank:
It
is a measure of web popularity. The rank is calculated using a
combination of average daily visitors and page-views over the past
three months. The site with highest combination of visitors and
page-views is ranked 1. This parameter is updated daily.
- Average Time on
site:
It is a measure of user attention. The average number of minutes a user
spend per day on the site averaged over the past three months. This
parameter is updated daily.
- Sites linking in:
It is a measure of
website's reputation. The number of sites linking to the website is
this measure. Multiple links from the same site are only counted once.
This parameter is updated quarterly.
- On line Since:
It is the date the domain was first registered.
Moreover:
A
world map linked to the tree map
shows these data:
A colored circle is located on each of the first (generally five)
countries with more website users. By color and circle's size
is
encoded the quantitative data of the percentage of users. However, also
a
bar chart reports the percentage of users for each one of the countries
represented in the world map.
Second
screen shot shows all Science's
subcategories . The user has just opened the contents of the sub
category Physics. The next screen shot will show the Physics
sub category just zoomed in by a press on the left button mouse on the
left uper box.
Third
screen shot shows all web sites belonging to the subcategory physics.
Moreover, the user has just pressed the key 'i' and m' to turn on the
additional options to get website's information and the world map with
the
first five countries having more users relative to the web site chosen
by the user. Specifically in this case, the user is rolling over
www.cern.ch web
site, so the world map and the information are relative to this web
site.