In seiner Funktionalität auf die Lehre in gestalterischen Studiengängen zugeschnitten... Schnittstelle für die moderne Lehre
In seiner Funktionalität auf die Lehre in gestalterischen Studiengängen zugeschnitten... Schnittstelle für die moderne Lehre
Exploring the geographical dimension of Wikipedia articles throughout their language versions
We live in a interconnected world defined by a continuously moving interplay of knowledge paradigms which are driven by sciences, laws and religions, amongst others. Across personal and professional spheres, everybody has to deal with distinct changing discourses and truths in order to create a personal perception, which ends up to be the reality that is lived in.
Knowledge results from a “complex process that is social, goal-driven, contextual, and culturally- bound”1. As the book as such is being supplanted by digital networks as the main metaphor of knowledge, the previously separated ecosystems originated from the book era are evolving into a global network, which new intrications imply disrupting changes: the meaning of learning, memorizing, sharing, communicating or even dealing with the very own opinion are shifting.
Weinberger, D. (2010, February 2). The Problem with the Data-Information-Knowledge-Wisdom Hierarchy. Harvard Business Review. Retrieved April 29, 2013, from http://blogs.hbr.org/cs/2010/02/data_is_to_info_as_info_is_not.html ↩︎
Over the last ten years, Wikipedia, a knowledge network, has irrevocably supplanted the traditional encyclopedias and has become a ubiquitous source across cultures. Because of its multilingualism, its huge volume and its highly democratic structure, Wikipedia is not as much curated and reviewed as “classical” scientific publications and opens accordingly ways to display information reflecting very different world views or angles. Every Wikipedia article is entitled to present a truth which rely on factual information. However, because of its composite nature, which rely on a constant editing made by a multitude of users sometimes focusing on very different aspects, it depicts a far more complex reality, which is sometimes hard to apprehend. Wikipedia articles are a great source to get to know a topic, but if we, as (informed) users, want to go further into it, it is also very important to question the scope and look for alternative sources so that the newly-acquired knowledge can be reliably consolidated.
Consolidating Knowledge from Wikipedia One of the possible strategy to consolidate knowledge acquired within Wikipedia is to explore the different language versions of an article—this obviously depending on which language we personally master. This strategy happens to be a great way to discover and question discrepancies as well as commonalities across languages. Reflecting on those really helps on weighting the important facts, discerning differing opinions or cultural backgrounds and tracking back the largest possible range of external references (which back any well reviewed article).
At the beginning of the project, I tried to see how cultural and knowledge diversity out of Wikipedia has been visualized and I went across a row of interesting projects, highlighting that people who edit articles about places don‘t necessarily live nearby ([Who edits Wikipedia? A map of edits to articles about Egypt](http://www.zerogeography.net/2013/03/who-edits-wikipedia-map-of-edits-to.html „http://www.zerogeography.net/2013/03/who-edits-wikipedia-map-of-edits-to.html“)) or that geo-tagged articles related to american places are in average longer that those related to european places ([Article Quality in English Wikipedia](http://www.zerogeography.net/2011/12/article-quality-in-english-wikipedia.html „http://www.zerogeography.net/2011/12/article-quality-in-english-wikipedia.html“)).
At the beginning of the creation process, I explored two different directions:
We have so far reflected on the Wikipedia article as an independent information entity, however this happens to be a narrow perspective, which needs to be extended in order to echo its networked nature. Because of this but also because of the complexity inherent to our world, no article is going to be able to cover a topic on its own without being linked to further articles, which contain complementary contents. Following this point of view, any Wikipedia article should be considered as a network node with an immediate network and could be renamed “networked article”. It echoes David Weinberg who talked about network facts as something which “exist within a web of links that make them useful and understandable“1.
Weinberger, D. (2012). Too Big to Know: Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room. New York: Basic Books. ↩︎
Network visualizations remain hard to decipher: they are intricate and (mostly) supply very few points of references to the readers. I attempted to address these two points by mapping networked articles on maps. Those follow an universal visualization metaphor, which offer very clear reading rules and a geographical dimension which allows to “untangle” network complexity. As this approach filters out articles without any geographical information, it only displays a partial view of the content of Wikipedia articles, which however enables comparisons.
Comparing language versions within a dedicated interface I decided to create an interface which helps to explore geographical topologies of networked articles and that also enables comparison between languages. I intended to design an explorative tool where visual contrasts tend to highlight differences or knowledge diversity. Within my project I exemplary processed wikipedia articles of towns.
Parsing the data
A Wikipedia article consists of different parts: abstract, table of contents, infobox, body of the article and [navigation boxes](http://en.wikipedia.org/wiki/Wikipedia:Navigation_templates „http://en.wikipedia.org/wiki/Wikipedia:Navigation_templates “) (for cross-navigation). All the parts are readable as one object, except the navigation boxes which are not always fully displayed by default, very much packed with links and sometimes hard to read. For this reason, I decided to exclude them from my definition of networked article, as I wanted to only process the “discernible” content to the user. This decision had heavy consequences as the Wikipedia API call which provides the outgoing links an article includes the links from the navigations boxes. The consequence was that I had to parse the article on my own to look for links.
Parsing an wikipedia article can be done with two different sources: the parsed text in html or the markup text (which reflect the way users write article). I went for the second one, as it looked cleaner to process but it proved to be a poor decision as I ran into consistency problem, amongst others. At the first glance, internal links in Wikipedia seemed to be all delimited by [[double square brackets]]. After a while however I remark that the Town twining mark syntax in the french version doesn‘t not follow this principle. Within the two articles displayed in the prototype, I edited the data I generated manually to reflect this situation.
Linking language versions together I assumed for a while that I could use [DBpedia](http://de.dbpedia.org „http://de.dbpedia.org“) to get a unique identifier to link all the language versions of a single article together. I spend a lot of time experimenting with it but unfortunately I couldn‘t get any tangible results. DBpedia proved very slow and very complex to use (to me) so that I had to look for an alternative which I found into [Wikidata](http://en.wikipedia.org/wiki/Wikipedia:Wikidata „http://en.wikipedia.org/wiki/Wikipedia:Wikidata“) which is directly done by the Wikimedia foundation and went online only recently.
Finding parameters to compare language versions In the next step, I tried to find parameters which go beyond the topological aspect and characterize the quality of the relationships between an article and its linked articles. Reflecting on the data I had and the possible computations of it, I processed the following parameters:
In order to compare language versions, I created a interface which presents two parallel and synchronized maps, accordingly displaying two isolated topologies. I decided against a single map displaying with two overlapping languages as it happens to weaken the readability and the perception of the respective topologies. To compensate the drawbacks of displaying information on two maps (no comparison by overlapping possible), I offered a visual way to highlight the links between a main article and its linked article with the “show network” function. Further this function allows the users to keep track of all connections, even if they are situated far off, outside of the current view.
As the quantity of text available remain an important factor, but is quite challenging to display directly on the map, I implemented two pie charts reflecting the text quantities corresponding to the settings chosen by the user.
I implemented two articles, Berlin and Annecy in France (my hometown), in three languages and tried to find interesting facts and discrepancies.
Annecy
Berlin
Afterward, it is always interesting to try to figure out reasons why contents are developed in such different ways.
I tried to use a much as possible off-the-shelf solutions:
This project has been very intensive. It was a great start up for my master thesis and it also allowed me to pick up a little on programing and experience a comprehensive creation process on my own (the conception, the compilation of a data set and the creation of an interface). I tried a lot, failed a lot, “waste” a lot time but I enjoyed the learning curve and the result overall. I would be happy go back to this topic after my Master thesis.
Many thanks to Sebastian Meier for the support.