Digital Extinction? While Yeeyan and TED demonstrate that it's possible for volunteers to produce high-quality translation of challenging academic lectures, and Meedan suggests that the combination of machine and human translation could enable real time communication across languages, the really exciting possibility comes from bringing these methods together. For machine translation to work, programmers need large corpora of material translated between a pair of languages. While the amount of text translated by Global Voices or TED is currently a small fraction of the text necessary to build a statistical machine translation system, a partnership between translation communities and machine translation experts might be a way to generate corpora for where few other options exist. The 4000 translations produced by Global Voices Malagasy, totaling 300,000 words is only 1.2% of the size of the Europarl corpus (one of the key sources for parallel corpora, derived from European parliamentary proceedingscxxxviii) and likely too small for an accurate machine translation systems. On the other hand, it's probably the largest available corpus that translates between those two languages.
Google's ambitions for translation mean they need to take seriously the existence of any corpora for African languages. According to Denis Gikunda, who leads African language initiatives for the company, Google plans to offer translation services, interfaces and content in over 100 African languages that have at least 1 million speakers, including Meru, his native tongue, which is spoken in the area near Mount Kenya.cxxxix For now, Google is focusing on bigger languages - Swahili, Amharic, Wolof, Hausa, Afrikaans, Zulu, Setswana and Somali, all of which have at least 10 million speakers.
For Google or others to translate Malagasy, they need more than a set of pages translated between English or French and Malagasy - they need a much larger set of data to build a Malagasy "language model". In other words, for Malagasy to be translatable through statistical machine translation, there needs to be lots written in Malagasy available online or easily digitized.
And that presents a problem. Consider the Malagasy Wikipedia, which contains about than 25,000 articles. That makes it the 75th largest Wikipedia in the world, and the second largest in an African language (behind Yorbua but ahead of Swahili and Afrikaans.) Many of the potential contributors to the project are well-educated Malagasy, who speak French fluently. The French Wikipedia has fifty times as many articles, and a vastly larger audience. For a Wikipedian looking to have her contribution read and appreciated, she's likely to make the rational choice contribute in French. Lova Rakatomalala, a contributor to the Malagasy Wikipedia, explains the Catch-22: "My hunch is that people are not using the (smaller language versions of) Wikipedia because of a vicious cycle. People don't want to create the content because no one is reading, and no one is reading because there is no content."
This would be more discouraging if Lova weren't deeply engaged in expanding the amount of Malagasy content available online, both through Wikipedia and through Global Voices, where he co-founded our Malagasy language edition. But it helps elucidate just how complicated the issues that surround the polyglot internet are. If Malagasy speakers post more content online, more Malagasy speakers are likely to create content in their native language. With more content - and especially more content in translation - online, Google and others may be able to build machine translation systems, which in turn mean that content available only in Malagasy can be read by speakers of other languages.
If Malagasy speakers decide, instead, to create content in French, looking for a larger audience, they may suffer another problem. Projects like the English and French language Wikipedias are reaching "maturity" - the projects contain so many articles that experienced editors now reject at least as many new articles as they accept. Articles about important aspects of Madagascar's geography, fauna and culture may be enormously significant to people in that country, but might not meet Wikipedia's "notability" threshhold for inclusion in the French Wikipedia. In a Malagasy Wikipedia, local knowledge is an obvious candidate for inclusion; in a larger, more global Wikipedia, the same information might not merit an article.
In March of 2010, Kenyan band "Just a Band" released a music video for their song "Ha-He" on YouTube. The video is the imagined trailer for a 1970s Blacksploitation movie featuring a badass superhero named "Makmende". The term "Makmende" doesn't have Swahili origins - it turns out to be the Kenyan transliteration of Dirty Harry's famous line "Go ahead, make my day." Kenyan blogger archer explains, "Makmende was a term used way back in the early to mid 1990s to refer to someone who thinks he's a superhero. For example, if a boy who's watched one too many kung-fu movies on TV decides to unleash his newly acquired combat skills, he would be asked "Unajidai Makmende, eh?"
Makmende quickly went viral on the Kenyan internet, appearing in tribute videos, on photoshopped bank notes and as the subject of a website where Kenyan users posted testimony of Makmende's greatness: "After platinum, albums go Makmende"; "Makmende once visited the British Virgin Islands. They're now called the British Islands." While Makmende quickly conquered the Kenyan internet, he had a harder time breaking into Wikipedia. The first article stubs posted were quickly deleted as vandalism, copyright infringement and "Patent nonsense, meaningless, or incomprehensible." Makmende was finally enshrined in Wikipedia when an article in the Wall Street Journal and a segment on CNN made it clear to international audiences that Makmende deserved inclusion as Kenya's first major viral internet meme. But the early conflicts over the article point to an interesting tension: can American and British editors of the English language wikipedia make informed judgement about what aspects of Kenyan culture should be considered relevant and worthy of an encyclopedia entry? Heather Ford calls this problem "local notability" and worries that Wikipedia may exclude developing world participation - in global languages and local ones - if they're not able to address the problem.cxl Failing to include a Kenyan internet meme in a vast encyclopedia may not represent a crisis. But the extinction of human languages might. Anthropologist Wade Davis notes that half of the world's 6000 languages are no longer being taught to school children. Without another generation of native speakers, most will die out.cxli Those concerned about language extinction worry about languages being forced out by culturally dominant neighbors - the 5 million speakers of Mayan often also speak Spanish, a language that's spoken globally. It's not hard to imagine those speakers deciding it's to their economic advantage to primarily speak Spanish and let Mayan slowly disappear.
The cases we're considering here outline another way languages can disappear, digitally. If speakers don't have an incentive to create content in a language, we won't have enough content online to build language models for translation. The bits of Malagasy or Mayan content online will may remain linguistically "locked up", available only to native speakers and invisible to everyone else. We may be facing a wave of digital language extinction, where some languages have a large enough online presence to maintain a community and develop a machine translation system, while others fall beneath that threshold and never have a significant online footprint.
Making Translation Transparent The ability to translate a language - via automated systems or volunteer translators - doesn't guarantee that we'll ever encounter those translations. Roland Soong's translation of the Qilu Evening News story about Liaxiang Tech was available online, but the journalists writing about Chinese hackers didn't find it. For many of us, information not easily accessible via a search engine doesn't exist. Crossing the barrier of language requires more than making translation possible - it means making language transparent.
At moments of crisis, we're often reminded how powerful language barriers can be. As Tunisia, Egypt and then much of North Africa and the Middle East exploded into popular protest in early 2011, many fascinated readers turned to Twitter for real-time reporting and commentary. Much of what was most interesting on Twitter was written in Arabic, not in English. Some extraordinary reporters, like Dima Khatib, Al Jazeera's Latin America bureau chief, acted as real-time translators, posting in Arabic, English and Spanish to reach a broad swath of users.
Andy Carvin, NPR's strategist for social media, put other matters to the side for the first months of 2011 and dedicated himself to covering these struggles via online media. His twitter feed, followed by over 25,000 readers around the world, frequently included pleas for help translating a slogan shouted in Tahrir Square or a tweet offered from a Libyan dissident - with such a wide audience following his aggregation, translations often appeared seconds later, and Carvin immediately reposted and shared the information. Danny O'Brien, an advocate for online free speech with Committee to Protect Journalists, took a step to automate the process and wrote a simple tool - a web browser extension - that adds a "translate" button to Twitter beside each individual tweet, allowing an interested reader to quickly read a machine translation of an otherwise unreadable post. (I'm thoroughly addicted to O'Brien's tool and find that it's helped me keep up with my friends on Twitter who write in Chinese and Japanese, as well as Arabic.)
Carvin and O'Brien's methods work well when we're motivated to seek out content in another language. But we're still more likely to decide to follow a Twitter friend who speaks in a language we understand. Until language becomes entirely transparent, it's likely to shape who we choose to listen to and who we ignore.
Google's Chrome browser offers a subtle and powerful linguistic feature. When you load a page, the browser tries to detect what language it's written in and, it the language isn't your default, offers a machine translation of the content. You can disable this feature, accept translations as they're offered, or most powerfully, tell Chrome to always translate content in a particular language into your native tongue. My installation of Chrome now renders pages in Chinese, Japanese and Arabic into English for me by default, and I've discovered that I no longer instinctively reach for the back button in my browser when I stumble of the comfortable path of English-language pages. The translations offered are often hard to read, but at minimum, I have a sense for what topics they're covering and whether I might beg a multilingual friend for a more readable translation.
Before language begins to recede as a major barrier online, translation needs to move out from the browser and into the search engine. When we look for information through a most search engines, the language we use to build a query limits the results we get. Search Google or Bing for "apple" and you won't get the same results as you'd get searching for the Spanish equivalent, "manzana". This makes sense, of course - most people searching in English would prefer English-language results. But this limitation can constrain what information is available.
Ivan Sigal, Global Voices's executive director, is a serious amateur cyclist. When he bought a secondhand, handmade bicycle frame made by an obscure, defunct German manufacturer called Technobull, he immediately wanted to learn more about his new ride and other folks who rode the same brand. Searching for information on Google.com, he found virtually nothing, a few dozen pages in English referencing the brand as elite and expensive and one page of Flickr images. So he turned to google.de and discovered thousands of pages, including an active online forum of riders who venerate Technobull cycles. Ivan speaks a little German, and some of the riders were willing to answer his questions and help him out. The information Ivan needed was in German, not English, and Google.com wasn't able to help him find what he needed.
Yet. Google's head of product management, Anjali Joshi, is passionate about ensuring that language isn't an insurmountable barrier to sharing knowledge. A native speaker of Kannada (check), she wants to ensure that, moving forward, "A person in Korea, or any part of the world, should have access to all information on the web in their language, rendered perfectly, in a way that's readable, understandable, findable." This goes beyond ensuring that a search for apples looks for results in English, Spanish and Korean: "Eventually we want people to be able to converse with each other on chat, to move seamlessly between languages in spoken and written language."
Even with Google's dramatic progress in translation, introducing translation services between 60 language pairs in 6 years, this is a long path. (Google can translate between English and 60 other languages. Translations from Icelandic to Yiddish, for instance, go through English as a "bridge language.") "There are three keys to getting there," she tells me. "We need excellent machine translation first. Then we need perfect search results across all languages." Her colleagues, sitting with us in a Mountain View conference room, look slightly nervous about the challenges of achieving Joshi's vision as she leans back in her chair and offers the third step. "Once you can search in every language, then we need perfect translation from there. That would be Nirvana."
Chapter 6 - Taken in Context The early 1980s weren't especially kind to Paul Simon. He ushered in the second decade of his post-Simon & Garfunkel life with "One Trick Pony", a forgettable companion album to a forgettable film starring his former musical partner Art Garfunkel. A 1981 reunion concert with Garfunkel brought 500,000 people to New York's Central Park, and sold over 2 million albums in the US, and the two began touring together. But the tour ended prematurely due to "creative differences" between the two, and a planned Simon & Garfunkel album became a Paul Simon solo release, "Hearts and Bones", an ambitious experimental album that was the lowest-charting of his career. With the breakup of his marriage to actress Carrie Fisher, "I had a personal blow, a career setback and the combination of the two put me into a tailspin," Simon told biographer Marc Eliot.cxlii During this dark period, Simon was mentoring a young Norwegian songwriter, Heidi Berg, who was working with Saturday Night Live's house band. Berg gave Simon a cassette copy of "Gumboots: Accordion Jive Hits, Volume II", a collection of South African mbaqanga music from Sowetan musicians, featuring the Boyoyo Boys, who'd recorded the title track. Listening to the cassette in his car, Simon began writing new melody lines and lyrics on top of the sax, guitar, bass and drums of the existing track.
"I was consciously frustrated with was the system of sitting and writing a song and then going into the studio and trying to make a record of that song. And if I couldn't find the right musicians or I couldn't find the right way of making those tracks, then I had a good song and a kind of mediocre record," Simon told Billboard Magazine's Timothy White. "I set out to make really good tracks, and then I thought, 'I have enough songwriting technique that I can reverse this process and write the song after the tracks are made.' And if I have a really good song, well then, my chances of making a good record are vastly improved over the other way of working."cxliii In the hopes of working this new way, Simon turned to his record company, Warner Brothers, to set up a recording session with the Boyoyo Boys. In 1985, this was far from an easy task. Since 1961, the British Musicians Union had maintained a cultural boycott of South Africa, managed by the UN Center Against Apartheid. The boycott was designed to prevent musicians from performing at South African venues like Sun City, a hotel and casino located in the nominally-independent bantustan of Boputhatswana, an easy drive from Johannesburg and Pretoria. But the boycott covered all aspects of collaborations with South African musicians, and Simon was warned that he might face censure for working in South Africa. He consulted with Quincy Jones and Harry Belafonte, both of whom were close to South African musicians, and with their blessing, headed to Johannesburg for two and half weeks to record.
When Simon called Warner Brothers for help, they called Hilton Rosenthal. Then managing an independent record label in South Africa, Rosenthal is a fascinating and important figure in South African music history. A middle class white South African, he found himself in charge of the "black music" division for the Gramophone Record Company in the mid-1970s. He wondered whether there was space for music in South Africa that wasn’t purely white or black, and began working with Johnny Clegg and Sipho Mchunu, the two musicians who became the heart of Juluku, a racially-integrated band that electrified traditional Zulu music and brought it to a global audience. He recorded collections of township music and attempted to distribute them internationally, and is one of the key figures in developing "world music" as a genre, helping launch the career of Raï superstar Cheb Mami. Rosenthal’s label had partnered with Warner Brothers to distribute Juluku’s records in the US, and so Warner executives had worked with Rosenthal previously and knew he could help Simon navigate a relationship with South African musicians.
As someone who’d recorded a highly political, integrated band in apartheid South Africa, Rosenthal was aware of some of the difficulties Simon might face in recording with Sowetan musicians. He assured Simon that they’d find a way to work together and sent him a pile of twenty South African records, both mbaquanga acts and choral "mabazo" groups, including Ladysmith Black Mambazo. And then he set up a meeting with the black musicians union, to discuss whether members should record with Simon. (Need to clarify this, talking to Rosenthal’s friends about sequence of events)
The musicians had reason to be skeptical of such a collaboration. Paul Simon wasn’t the first musician to have the clever idea of building pop music around the driving rhythyms of mbaquanga. That title goes to one of popular music’s great appropriators, Malcolm McLaren.
McLaren is best known as the Svengali behind the Sex Pistols, assembling the seminal punk band at his London clothing boutique. McLaren had "discovered" punk a few years earlier, briefly managing The New York Dolls at the close of their career, and brought the fashion of New York emerging punk scene to Britain. The controversial, explosive, brief and ultimately tragic career of the Sex Pistols launched McLaren as a musical innovator, and he went on to court additional controversy with his next band, Bow Wow Wow, which featured the 15 year old singer, Annabella Lwin, posing nude on the cover of their 1981 debut album.
For McLaren's next act, he didn't bother building a band. "Duck Rock" is a complex and compelling pastiche of influences around the globe: American folk, early hiphop, Afro-Caribbean... and lots and lots of mbaquanga music. "Double Dutch", an ode to African-American jump rope culture, is built around an instrumental track, "Puleng", by the Boyoyo Boys. McLaren didn’t credit the Boyoyo Boys for the track, claiming he’d authored it with Yes bass player Trevor Horn. The album borrowed heavily from other South African acts, including Mahlathini and the Mahotella Queens, who also worked unpaid and uncredited.
When Simon approached Rosenthal about recording with the Boyoyo Boys, he and the Boys were in the early stages of a lawsuit attempting to get royalties from McLaren. (They eventually succeeded in an out of court settlement.) But Rosenthal supported the idea, and a majority of the black musicians' union agreed to invite Simon to South Africa to record. They worried that the UN cultural boycott was preventing mbanquanga music from taking its place on the global stage, reaching the prominence of reggae, for instance. Realizing that Simon's stature could bring a great deal of attention to the local musical scene, they voted to work with him.
The sessions that Rosenthal organized led to Graceland, one of the most celebrated albums of the 1980s. It won Grammy awards in 1986 and 1987, topped many critics charts and regularly features on top 100 albums of all time lists. It also made a great deal of money for Simon and the musicians he worked with, selling over 16 million copies. South African songwriters share credits and royalties with Simon on half the album's tracks, and Simon paid session musicians $196.41 per hour, three times the US pay scale at that point for studio musicians. Graceland also raised the profile of the musicians featured on the album. Ladysmith Black Mambazo, already a major international act – Simon produced Ladysmith’s subsequent three albums, which sold well in the US and Europe. Simon's session players didn't suffer either: Bassist Baghiti Khumalo went on to record with Gloria Estefan and a variety of other US and African acts; guitarist and arranger Ray Phiri recorded with everyone from Laurie Anderson to Willie Nelson; drummer Isaac Mthsli went on to support reggae great Lucky Dube.
At its best, "Graceland" sounds like Simon is encountering forces too large for him to understand or control. He's riding on top of them, offering free-form reflections on a world that's vastly more complicated and colorful than the narrow places he and Art Garfunkel explored in their close harmonies. The chorus of "Boy in the Bubble" - "These are the days of miracle and wonder, this is the long distance call" could serve as a tagline for anyone confronting our strange, connected world. Simon's not cutting and pasting from a global palette of sounds the way McLaren is – he's being swept along by the brilliant musicians he's playing with, trying frantically to tell us what he sees through the window as the train rushes forwards.
Collaborations like Graceland don't happen without the participation of two important types of people: bridge figures and xenophiles. Xenophiles are people for find inspiration and creative energy in the vast diversity of the world. They move beyond an initial fascination with a cultural artifact - a cassette of mbaquana music - to make lasting and meaningful connections with the people who produced the artifact. Xenophiles aren't just samplers or bricoleurs; they take seriously both forks of Kwame Appiah's definition of cosmopolitans: they recognize the value of other cultures, and they honor obligations to people outside their own tribe, particularly the people they are influenced and shaped by. Simon distinguishes himself from McLaren by engaging with South African musicians as people and by becoming an advocate and promoter of their music.cxliv Bridge figures straddle the borders between cultures, figuratively keeping one foot in each world. Rosenthal was able to broker a working relationship between a sometimes prickly white American songwriter and dozens of black South African musicians during some of the most violent and tense moments of the struggle against apartheid. As a bridge, Rosenthal was an interpreter between cultures and an individual both groups could trust and identify with, an internationally-recognized record producer who was also a relentless promoter of South Africa's cultural richness.
Bridge Figures The term "bridge figure" has a murky genesis. Chinese activist and journalist Xiao Qiang and I started using the term to describe the work bloggers were doing translating and contextualizing ideas from one culture into another. Shortly after, Iranian blogger Hossein Derakshan gave a memorable talk at the Berkman Center as part of the Global Voices inaugural meeting. Hossein explained that, in 2004, blogs in Iran acted as windows, bridges and cafés, offering opportunities to catch a glimpse of another life, to make a connection to another person, or to convene and converse in a public space. I've been using the term "bridgeblogger" ever since to refer to people building connections between people from different cultures using online media, and "bridge figures" to describe people engaged in the larger process of building understanding between cultures.
To understand what's going on in another part of the world often requires a guide. The best guides have a deep understanding both of the culture you're encountering and the culture you're rooted in. This understanding usually comes from living long periods in close contact with different cultures. Sometimes this is a function of physical relocation – an African student who pursues higher education in Europe, an American Peace Corps volunteer who settles into life in Niger semi-permanently. It can also be a function of the job you do – a professional tour guide who spends her days leading travelers through Dogon country may end up knowing more about the peculiarities of American and Australian culture than a Malian who lives in New York City but interacts primarily with fellow immigrants.
My friend Erik Hersman is an American, a former Marine, who lives and works in Nairobi, Kenya. The child of American bible translators, Hersman grew up in southern Sudan and in the rift valley of Kenya. After school and military service, Erik ran a technology consultancy in Orlando, Florida, making regular trips to East Africa to document technological innovation on the blog Afrigadget. He moved to Nairobi to lead the iHub, a technology incubator in central Nairobi designed to nurture internet-based startups.
Erik is able to do things most Americans aren’t able to do. He can wander around Gikomba, Nairobi and talk to local metalworkers in Swahili, and video the process of turning the drive shaft of a Land Rover into a cold chisel for his blog, because he’s a Kenyan. And he can help Kenyan geeks develop a business plan to pitch a novel software venture to international investors, because he's an American geek. Lots of people have one of these skill sets – bridge figures are lucky enough to have both.
Sociologist Dr. Ruth Hill Useem uses the term "third culture kid" to describe individuals like Erik, who were both raised in the home culture of their parents, and the culture of the places they grew up. Useem argues that kids raised in this way end up developing a third culture by combining elements of their "birth" culture and the local culture they encounter. She argues that children who go through this process – the kids of military personnel, missionaries, diplomats and corporate executives – often have more in common with each other than with other kids from their birth culture. Researchers who've followed in Useem's footsteps have found evidence that some third culture kids are often well-adapted to live and thrive in a globalized world. They're often multilingual as well as multicultural, are often very good at living and working with people from different backgrounds. As a downside, some third culture kids report feeling like they’re not really at home anywhere, either in their birth culture, culture they were raised in or any new culture.
While Useem's research focuses on primarily on North Americans and Europeans growing up in other parts of the world, international patterns of education and migration are giving people from many nations the opportunity to become bridge figures. Hundreds of the individuals who write or translate for Global Voices are citizens of developing nations who've lived or worked in wealthier nations, learning new languages and cultures as students, migrants or guest workers.
Merely being bicultural isn't sufficient to qualify you as a bridge figure. Motivation matters as well. Bridge figures care passionately about one of their cultures and want to celebrate it to as wide an audience as possible. One of the profound surprises for me in working on Global Voices has been discovering that many of our community members aren't motivated by a sense of post-nationalist, hand-holding "Kumbaya"-singing, small world globalism, but by a form of nationalism. Their work on Global Voices is often motivated by a passion for explaining their home cultures to the people they’re now living and working with. As with Erik's celebration of Kenyan engineering creativity, and Rosenthal's passion for the complexity and beauty of South African music, the best bridge figures aren't just interpreters, but passionate advocates for the creative richness of other cultures.
Not everyone wants to build bridges between the cultures they’ve encountered. Immigrants – and especially the children of immigrants – often reject their birth culture and embrace the culture they’ve emigrated to in a way that makes then poorly positioned to act as cultural bridges. Others reject the culture they’ve moved to, staying rooted in their birth culture and consciously remaining outside the culture they’re living within. In Mohsin Hamid’s insightful novel “The Reluctant Fundamentalist”, the protagonist and narrator, a Pakistani Muslim working in the New York financial sector, rejects his role as a "janissary" for American capitalism and becomes a leader of fundamentalist students in his native country. We expect the narrator to act as a bridge, explaining his home country to a Western reader – when he fails to act as a bridge and turns on his audience, it’s a deeply surprising narrative moment.