Visualizing Chinese Character Structures

Sophie Hilgard


Project Overview

As a Chinese language student at Harvard, I've become interested in the construction of Chinese characters and how components contribute to meaning and pronunciation. In particular, I've come to realize that a better understanding of character structure can make many written characters much easier to remember, which is one of the major problems in learning Chinese.

I hoped to find some way to get an idea of how useful phonetic and semantic components are in breaking down the pronunciation and definition of Chinese words, perhaps with implications for the importance of radical study in beginning Chinese classes.

The majority of the data came from http://chinese-characters.org. It was acquired through the use of multiple Python data scrapers.


Visualization Approach

I first categorize the characters based on their radicals. This is a very traditional method of organizing Chinese characters and really one of the only ways to represent the set of 1000 characters in a reasonable manner.

When a radical is clicked, it displays all of the characters in the 1000-character dataset which are categorized by that radical. These 1000 characters are encoded by size, where size corresponds to frequency, a proxy for importance. I chose to use size, because it is effective at communicating ordinal data, and it draws the eye to the most important characters. The radical also appears in a box to the right, which displays its pinyin pronunciation and definition. When another character is moused over, the pinyin and definition for that character are displayed as well, with the field we expect to be related (based on whether the relationship was categorized as phonetic, semantic, or other) is highlighted in red to allow for easy comparison. (It is possible that some radicals will not relate to any of the characters of the 1000 character data set. In this can press 'x' to go back to the home screen)

When one of these characters is clicked, that character becomes the selected character in the box on the right side. The main display window then displays a tree where the leftmost characters are those which make up the selected character. I feel that the tree structure is an ideal way to encode this connection data, as it communicates the inherent parent-child relationships. The way in which they are connected (semantic/phonetic/other) is encoded by the color, a highly effective channel for nominal data. The selected colors were generated from the ColorBrewer website to ensure aesthetic quality. Again, on mouseover, the pinyin and definition of a character are displayed, with the field we expect to be related highlighted in red. In terms of the display of this text, I attempted to design it so that the text would never fall off of the screen or cross into other sections of the display. However, I admit that this is still one of the portions of the design which could use some work. Pressing 'b' returns to the previous screen, and pressing 'x' returns to the home screen.


Applet

This browser does not have a Java Plug-in.
Get the latest Java Plug-in here.

Page created May 2009.