Wikipedia busts the language barrier

Jacob Aron
New Scientist

IF YOU have ever pored over the Wikipedia entry “Conspiracy theory”, you may think you know what it is like to go through the looking glass. But have you read all there is to know about UFOs in Spanish? Or Hebrew?

Omnipedia melds entries from 25 languages to give users fresh perspectives (Image: Omnipedia) 1 more image

To unlock such strange information, Brent Hecht of Northwestern University in Evanston, Illinois, and colleagues have created Omnipedia, a software system that lets users browse topics from up to 25 Wikipedia language editions at once.

“There is so much information out there that isn’t in your native language, some of which reflects cultural viewpoints,” says team member Patti Bao.

The team’s goal is to provide access to all of the online encyclopedia’s wealth of knowledge – not just the paranoia-stoked bits. To do this, Omnipedia exploits the sidebar links between different language editions added by multilingual Wikipedia editors. The system follows these links and analyses which topics are universal within the subject area and which are particular to each language, and displays them all (see picture, based on Wikipedia’s “New Scientist” entries).

This method of topic analysis is not foolproof: many languages’ “conspiracy theory” entries link to an article about UFOs, for example, but the Spanish Wikipedia does not, even though it has an article about UFOs, entitled objeto volador no identificado. So the team also applies an algorithm that hunts for such missing links.

Omnipedia displays each topic found within a particular article as a circle split into coloured segments, each representing the language in which the topic is discussed. Clicking on a circle opens the relevant snippet from the article, which is automatically translated using the Bing Translator service.

Bao and colleagues asked 27 volunteers from a range of linguistic backgrounds to try out Omnipedia. Many said they had not realised how much information was absent in the English Wikipedia, while others were surprised at the breadth of coverage in other languages, such as a Japanese entry on reggae.

“We should always be aware of the inevitable biases in the knowledge we produce and reproduce,” says Mark Graham of the Oxford Internet Institute in the UK, who was not involved with the work. “Omnipedia helps us to do that.”

The team presented the system at the Conference on Human Factors in Computing Systems in Austin, Texas, last week.