My friends at TED have launched an exciting new project today, the TED Open Translation Project. It’s a powerful system to allow the “social translation” of their video content. This tool demonstrates the state of the art in social translation on the web today, and I think there are a lot of lessons in the tool and thinking behind it for anyone who hopes to make the polyglot internet more comprehensible, and for anyone thinking about online cooperation.
I’m aware that most people think of translation as roughly as interesting as developing Linux device drivers – necessary, but far from sexy. My hope is to convince you that translation is one of the keys in helping the internet reach it’s potential and to get you at least a tenth as excited about this new tool and approach as I am.
For the past couple of years, TED has shared an amazing set of videos, talks delivered at the TED conferences in California, the UK, and Tanzania. These talks are some of the most fascinating and thought-provoking video content available on the web – many smart people have discovered TED talks and promptly lost a week or more gorging themselves on intellectual candy.
(A personal top five, for those who’ve not taken a deep dive into the videos that are available. I’m not going to argue that these are the “best” talks given at TED, but they are the ones that have had the most influence on me and my work:
– Ngozi Okonjo-Iweala, former Nigerian minister of finance, on the debate on trade and aid in Africa, framed in deeply personal terms, as she talks about her family’s struggles during the Biafran war.
– Swedish doctor and scientist Hans Rosling uses statistics and visualization to rethink international development over the course of decades and centuries.
– Majora Carter on the importance of environmental issues to urban communities, and the connection between community development and the green movement.
– Nigerian author Chris Abani on humanity, cruelty, compassion and storytelling. I’m not sure I’ve ever seen a talk swing between humor and brutality as rapidly and powerfully as Chris does in this talk. When he finished giving it live, I left the theatre because I didn’t want to hear anything else that day.)
For the past couple of years, these talks have been available to anyone with a good internet connection and the time to download them… but they’re only helpful to people who speak English, the language the talks were delivered in. TED, and specifically June Cohen, the director of TED Media, recognized that there’s huge international demand for TED’s content around the world – take a look at TedToChina, a fan site that offers summaries of TED talks in Chinese.
Translation is supposed to be difficult, time-consuming and expensive. Professional translators routinely charge between $0.20 and $0.40 per word – translating this blogposts into one other language would cost over $500 at market rates. The cost of machine translation has fallen from cheap to free, with powerful systems incorporated into Google and other search engines… but the results are far from perfect, and tend to miss the nuance of complex texts. Very few of us choose to read blogs – even on topics we enjoy and follow – via machine translation because the experience is so awkward.
But maybe translation doesn’t need to be so difficult and expensive. Maybe it’s something that interested, talented people will do for free, if given the right opportunities and incentives. That idea inspired the Global Voices community to launch Lingua, our project to translate Global Voices content into over twenty languages. In 2006, we discovered that Portnoy Zheng, an amazing Taiwanese blogger, was translating Global Voices stories into Chinese, and inviting other translators to help with his efforts.
We were thrilled, and started pointing Chinese-speaking readers to Portnoy’s efforts. Other groups, starting with the Francophones, proposed that volunteer translation of Global Voices content into other languages become an official feature of our community, and beginning in 2007, we’ve integrated volunteer translations into our site – under many of the headlines on the main site, you’ll see “zh”, “fr”, “mg” or another two-letter language code. Click on that code, and you’ll find yourself on a translation of that post.
There’s a growing movement to make “social translation” – translation of online information by users around the world, motivated more by community recognition and appreciation than by money – a mainstream approach to making the web more accessible to all readers. The movement has been led by the open source software community, and projects like Dwayne Bailey’s pootle toolkit, a set of tools that make it easier to localize open-source software. (Dwayne launched translate.org.za, a project that makes key software available in South Africa’s eleven official languages.) Inspiring projects in the space include WorldWide Lexicon, an open platform to allow cooperative translation of any website; Meedan, an online community that uses social translation as well as machine translation to build dialog between Arabic and English speakers, and dotsub, a powerful video subtitling and translation tool that invites anyone to become a subtitler or translator.
Cohen and her team looked closely at the tools and teams building the social translation movement and built a new community that learned from the successes and failures of other projects in the space. TED’s tool is based on dotsub, with some very powerful new features added, and their model for recruiting, recognizing and rewarding translators is inspired in part by some of the work we’ve done at Global Voices. For visitors to the site, this means that you can browse videos by language, selecting one of the 32 talks available with Spanish subtitles, or the sole talk available in Kyrgyz.
Select a talk in one of its translated forms, and you’ll get a subtitled video, a translated title and description of the talk. Featured in this description are the two people responsible for translating the talk, the lead translator and the reviewer – like Global Voices, TED is inviting translators to join the community, pairing new translators with trusted reviewers to evaluate the work and to offer any changes or suggestions. Another link on the page leads to an “interactive transcript” – this allows a viewer to select a point in the talk and fast-forward to see the slides and images that accompany the speaker’s words.
Not only is this a fantastically cool way to navigate these talks, it leads to my favorite undocumented feature of the system, which Cohen calls “the Rosetta Stone”. Pick a transcript of a talk in a language you speak. Then select subtitles in a language you don’t speak. You can watch the talk in three languages – the English of the speaker’s words, the Spanish of the transcript and the Turkish of the subtitles. (I suspect my wife, who speaks English and Hebrew well, and is learning Arabic, will addicted to this feature in the near future.)
(This ability to view the same text in many languages may turn out to be one of the most important aspects of the project in the long run. As TED translates hundreds of talks, they’re creating “parallel corpora”, the raw material for machine translation systems. This might be too small to build really strong Turkish to Vietnamese translation technology, but the idea of pulling corpora from tools like dot.sub is something that machine translation folks should be taking a close look at.)
The system is launching with 375 translations, representing 42 languages. Some extremely popular talks, like Al Gore’s talk on climate change, are available in over twenty languages – others are available just in English and one other language. What’s remarkable to me is how many of the talks were translated by volunteers – 200 of the first 300 translation posted, and June tells me that 450 volunteer translations are in the queue and will launch soon. She calculates that if TED had to pay for those translations, the 650 underway would have cost roughly $500,000. While that sum might be something sponsors, like Nokia, which is the lead sponsor for the translation project, might have been able to cover, June estimates the cost of translating all TED talks into 40 languages at over $13 million dollars. To achieve what TED really wants to accomplish – all talks in 300 languages – is over $100 million. It’s simply not possible to take on a task of that size without trying a social translation approach.
Why are people queueing up to translate TED talks for free? The system June and TED have launched leverages some of the lessons we’ve learned about social translation:
– Translation can be fun, if the content’s enjoyable. There aren’t a lot of people lining up to translate UN internal memos for free (according to some estimates, transcripts of UN meetings can cost as much as $8000 an hour to produce, leading to an organization translation budget of $100 million per year.) But TED talks are fascinating to a wide audience, and some people are excited about investing the time to translate them.
– Choice matters. On Global Voices, we don’t attempt to translate every story into every language – we let translators choose what stories they’re interested in. We don’t get a complete edition of our content, but we wouldn’t have such great participation if we assigned specific stories to translators. My guess is that TED is seeing a similar phenomenon, and that translators will initially gravitate to a small set of highly popular talks, then start translating talks that meet their personal interests over time.
– Translators need recognition. On the TED site, translators are some of the most prominently featured people on the page – click through on the translator or reviewer’s name, and you get a page featuring her photo, her work and recognizing her contributions. On Global Voices, we try to feature authors and translators equally – that model doesn’t make as much sense for TED, where the speakers are often celebrities, but it’s clear that TED is taking the translator’s role very seriously and honoring the contributions.
– Community matters. Our translators have the same sort of internal communications systems that our authors do – they divide up tasks, consult each other for assistance and support, and generally function as a tight community. My guess is that language communities are going to emerge on TED in much the same way, and that the translator/review mechanism is going to be critically important for building support, friendships and communities.
– Not all rewards are (directly) financial. GV rewards its most productive translators with travel funding to help them attend our annual meetings. I wouldn’t be surprised to see TED try something similar if they’re able to secure the funding. And we’ve found that translators use their GV experience as evidence that they are competent professional translators and gain more professional translation work from their association with us – again, I’d expect to see something similar with TED. My guess is that prominent translators in the TED community will also become “go-to” guys and gals for TEDsters who are looking for contacts in Turkey or Poland.
I’m really excited about TED’s project for two reasons. One is that it’s great to see an organization I respect and admire adopting and improving on a strategy we’ve embraced at Global Voices. June and I had coffee in NYC a couple of weeks ago, and when she told me that the translations produced by volunteers were frequently better than those produced by professional translation agencies, I was so happy I gave her a high-five. It makes perfect sense to me – translators motivated by pride, community support and interest might well do a better job than those just collecting a paycheck.
I’m also thrilled because TED operates on a very large stage, and their embrace of social translation sends a message to organizations and projects around the world who are considering whether and how they tackle issues of language. Because translation is historically difficult and expensive, most organizations have simply avoided it, except when absolutely necessary.
The internet is huge, growing, and being built by people who speak hundreds of different languages. There are editions of Wikipedia in over 200 languages, and some scholars estimate that there’s as much user content created in Chinese as there is in English. Unless we find scaleable, inexpensive ways to translate, we’re each going to face an internet that’s grows everyday, where we find less of the content understandable. Until we figure out better solutions to translation, we’re fooling ourselves into believing we’re more cosmopolitan and connected than we actually are.
Social translation isn’t the only solution, and it won’t solve the problem by itself. But it’s a great first step, and TED deserves real congratulations in building this great tool and bringing this strategy to global prominence… and for it’s commitment to the values of connection and bridging that underly their commitment to making this information available around the world.