World Migration Patterns
Abstract: This project visualizes the migration stock (population of migrants in the year 2000) of 32 OECD and 68 non-OECD destination countries and over 200 countries of origin. The emergence of non-traditional migration flows outside the most developed states, also known as “South-South” migration, has necessitated a deeper analysis of these patterns. Using a dataset that combines OECD and World Bank data, this visualization reveals the relationship between immigration and emigration and allows deeper investigation into specific groups of countries.
Unfortunately the libraries that were used to create this project are incomptible with Processing's Applet exporting feature. In lieu, we include screen shots of our visualization. Also our full length process book is available here.
We started with a dataset from the OECD. This dataset included the relevant migrant stock (how many migrants there were living in the destination country in 2010) of all OECD and 50+ non-OECD countries; and relevant demographic characteristics, duration of stay, and education level figures. We also used a SVG file for the basemap used for the Processing visualization. We cross-referenced the associated XML file with the dataset described above to match countries selected in Processing to their corresponding data entries.
Since the OECD dataset had country abbreviations rather than country names, we wrote a scraping program for the ISO alpha 3 letter abbreviation country codes, and merged this table with our original dataset in Google Refine. We had to manually refine some former Soviet Union and Yugoslavia countries and take into account slight variations in country names (i.e. renaming “Democratic People’s Republic of Korea” with “North Korea”).
After importing the data and implementing our code, we realized that scanning all 100,000+ rows of the dataset caused a massive lag. The large number of rows were caused by the large number of combinations of variables. Since we were only focusing on several variables, it was unnecessary to have the data disaggregated in such a way. We consolidated our data in Excel by summing the total number of migrants for each row entry based on whether the country name and desired variables had changed.
We began by drawing lines between all countries without differentiating between immigration vs. emigration, OECD vs. non-OECD, and continent (which would eventually become our filters). Our original plan had called for drawing arcs between data points, however, soon realized that Processing required us to draw parts of ellipses to form arcs, requiring a series of complex calculations that seemed far beyond the scope of the project (it would have involved constructing multiple ellipses and rotating them about several points-our intense trigonometry didn’t even yield results).
We created a variable called “thickness” which was a logarithmic scale of the number of migrants. Records with 1 migrant, which would otherwise have a thickness value of 0, were resized to .1 so the lines would still draw. We decided on a logarithmic scale because the total migrant value ranged from upwards of eight million (Mexico to USA) down to one (American Samoa to Chile). The logarithmic scale was not meant to convey actual values, but rather, to provide some context between the large and small magintudes of migration. Throughout our work, one of our main priority was optimizing code efficiency. After one of our first steps of consolidating the dataset to 10% of its original size and still experiencing significant lag time, we considered whether each task would slow our program. The basis of our visualization involved looping through every row of data and checking whether the clicked country was in the immigration or emigration field of the entry. In adding features to our visualization, we carefully considered the placement of our code to minimize how often we entered and what we did in the loop. As of the night before the project’s due date, there was still a several second wait upon clicking a specific country and further lag time once a filter was clicked (we fixed it!).
As expected, things didn’t look picture perfect when we first started. New Zealand’s lines were being drawn just off the West Coast of South America, Portugal was hanging out somewhere near Brazil, and Russia and the United States were so close we can only imagine the level of nuclear preparedness that would have been necessary. After hard-coding new locations for countries and figuring out how to keep our migration lines drawn above the other lines on the map, we were ready to implement our filters.
The main filter operation was to create a distinction between immigration and emigration. We devised a color scheme that was based on whether the selected country was in the recipient or sender column. Next, we created a filter based on continent (another field in the data). We saw that if we were examining the relationship between North African countries and a country in the Western Hemisphere, the lines going to South Asia were interfering. Allowing the user to filter by continent while maintaining the ability to view all lines fixed this problem.
We also created a filter based on OECD vs. non-OECD countries. The unique characteristic of this dataset (and the accompanying analysis that we are conducting with Professor Gest) is the "South-South" migration patterns: immigration patterns within non firstworld countries. By including over 50+ non-OECD countries, this dataset is one of the first credible attempts to account for this new trend in migration. While the migration stock figures of the dataset are less revealing than migration flow (these figures do not account for any migration since 1995), some interesting trends can still be observed. A country like Bolivia has many destination states in the developed world, but is also a recipient of immigrants from the Far East.
We created a bar graph that shows the top five immigration sources and migration destinations for the selected country. The graph only examines the filtered data and shows the top five country sources. The bar graph is laid out vertically and shows immigration stock as positive values and migration stock as negative values. We dynamically scale the graph based on the maximum stock value and align the tick marks and text to accommodate all values.
Our visualization enables users to explore immigration and emigration stock in a large sample of countries. It provides a visual basis to further investigate trends of migration amongst less-developed states included in the dataset.
Our study was somewhat limited by the nature of the dataset: rather than showing up-to-date migration flows, the most credible dataset simply had an aggregate of total living persons (above the age of 15) who migrated to the target country. This favors older trends in global migration and ignores the most recent flows which provide more evidence of South-South migration. In terms of aesthetics, we were also somewhat limited in both the basic outline of our map and stylistic touches we wanted to add. As mentioned above, we had tried implementing arcs in the visualization but found the necessary steps far beyond the scope of the project. The borders and layout of the basemap we used also limited us from changing the design of the map, which could probably be more visually appealing. We also wanted to implement a hover-over function which would highlight and raise migration lines, as well as retrieve values, upon mousing over the area. However, the constant checking of where the mouse was hovering necessitated a constant looping of our data to determine whether or not if fell within the boundaries.