Lost (and found) in Translation

One of the major focuses of Global Voices over the past year has been translating content from other languages into English, and translating the resulting English content into as many languages as possible. We were a little slow in realizing the importance of translation when we started GV – a large number of the blogs we were following were in English, and it seemed like added context to English-language blog posts might represent the majority of the work we did. The vast majority of early blogging tools were in English, which meant that unless someone wrote a guide to blogging in a local language, non-English speakers were at a distinct disadvantage. And many multilingual bloggers concluded that they’d have a better audience if they wrote in English, as few of their countrymen were reading content online.

But blogging has spread rapidly, and a site that promises to give you a picture of conversations in the global blogosphere has to consider a universe that’s much larger than English-language blogs. According to Technorati, by early 2006 Japanese was challenging English as the dominant language of the blogosphere, and Chinese was a close third. I strongly suspect that Technorati undercounts Chinese blogs, many of which don’t send updates to pingservers. One way or another, English-language content represents less than 40% of total blog content, possibly less than a third of all content.

Global Voices responded to this situation a year ago by hiring “language editors”, who are responsible for monitoring content in other languages and translating interesting excerpts into English. We began translating from Spanish, French, Portuguese, Chinese, Arabic, Farsi and Russian (with fiscal support from Reuters) and added Korean and Japanese with support from Outblaze and Sanriotown. And intrepid contributors and editors have translated post from over twenty additional languages, including Armenian, Khmer, Swahili, Urdu and Uzbek. To find these translations, historically you had to sort through our country categories and assume that any posts translating from Uzbek would be filed under Uzbekistan. But Boris and Jer just added a set of “language categories”, which let you find all posts that a contributor or editor has tagged as containing translations from a given language. The links to French and Spanish above are links to those new language category pages – as authors go back and tag all the Bangla translations, those pages will become available as well.

But bringing content into English is only one part of starting a conversation. If we’re really trying to serve a global audience with Global Voices, we need our content to be available in other languages as well. Portnoy Zheng, a brilliant Taiwanese graduate student (and now soldier), realized this problem almost two years ago, and began organizing Chinese speakers to translate Global Voices content into traditional and simplified Chinese. His system was the inspiration for the “Lingua” project, now managed by our former Francophone editor Alice Backer, who’s helped bring to life versions of Global Voices in Spanish, French, Portuguese and Bangla, with editions in Farsi, German and Russian under development. (German is especially interesting to me, as we don’t generally translate from German on Global Voices, since our focus is primarily on the developing world…)

What this means is that we’ve now got stories from Haiti which translate from French-language blogs, appearing in Japanese and Bangla. That’s a partial solution to the problems we’re interested in addressing – it lets people in Japan hear voices from Haiti. But we’re a long way from being able to enable conversations between Japanese speakers and Francophones in comment threads… and I got a sense of just how difficult that process could be the other day.

Global Voices has been a happy recipient of the largess of the Google Grants program. We’ve got $200 per day of Google Ads, which we use to promote the site. To try to recruit more volunteer translators to help with the project, I offered $100 per day of ads to Alice to promote the various new sites. She, in turn, went to her translators and came up with a series of Google Ads in French, Spanish, Portuguese, Chinese and Bangla.

So far, so good. But as anyone who has ever worked with Google Ads knows, it’s very easy to exceed the 25 character limit for the ad headline. So I started editing the headlines of the ads in French, Spanish and Portuguese with the help of Google’s language tools. I managed to produce ads for each language that met Google’s critera, and felt pretty good about my work… until the translators reviewed them.

I got immediate responses from our Spanish and Portuguese coordinators, politely suggesting corrections to my ads, adding that the taglines I’d created were incomprehensible, meaningless or both. Before I could hear from our French coordinator, I called Rachel, who’s near fluent in French. She took one look at my French and said, “We’ll help you to translate? Is that what you want to be advertising?” Uh, no – I’d been hoping for “Help us translate.” Good thing we fixed that one before bringing it live.

I should point out that Google Language tools are no longer in Beta for the Romance languages – these tools are quite mature at this point. And they’re really, really good if you’re encountering content in other languages and want to get the gist of what’s being said. But you don’t want to use these tools to author an ad… or to engage in coversation across language and cultural boundaries.

(By the way, I’ve found a couple of plugins that will allow me to select text in a webpage and use Google’s tools to translate it. What I really want is a plugin that knows what languages I don’t speak and proactively goes out and gets translations for those texts and inserts them in the page. I especially want this within GMail, where I’m on a couple of mailing lists that are in English and Chinese. Or I’d like to be able to tell GMail to feed messages in Chinese through a translator before handing them to me, as I’m not likely to pick up the language any time soon…)

Computer-based translation is vastly better than it was a few years back, but it’s still completely inadequate for what we’re trying to do with Global Voices – present information from around the world to people in their native languages. That helps explain another phenomenon I found with Google Ads. Every one of our translators wanted to purchase the keyword “translator” or “translation” in their native language, so that our ads could help recruit more volunteer translators. While keywords like “blogues du monde” are available for a bid of $0.30 per click, “traducteur” requires a minimum of $5 per click. Since our Grant ads can only go for $1 per click, we’re out of luck on high value words like “translation”. Guess there’s a little bit of demand for human translators these days…

Writing about language and the Internet always reminds me of the fact that I’m a monolingual idiot and that I’m missing out because I don’t speak French, Arabic, Swahili and Chinese. Given the explosion of content in languages other than English, I’m deeply worried at the shortage of people like Roland Soong, the legendary editor of EastSouthWestNorth, who translates huge volumes of Chinese content into English every day. I’m even more worried that other net users don’t seem to be clamboring to see what people are writing in rench, Arabic, Swahili and Chinese. Who knows what we’re all missing?

This entry was posted in Blogs and bloggers, Developing world, Global Voices. Bookmark the permalink.

6 Responses to Lost (and found) in Translation

  1. quixote says:

    (Interesting post. Just a small tip re translations from someone who’s been a polyglot since forever. Use the machine to translate your phrase to, say, French. *Then plug the French phrase into the same mechanical translator* and see what comes out at the English end. It gives you at least a rough check on whether you have total drivel. ;) )

  2. Solana says:

    Thanks for the plugin!

  3. hans says:

    ethan, why would you think you could write ads in a language you don’t speak fluently? please don’t take offense, but i’m surprised that the founder of global voices would have such an ‘american’ (in the non-positive sense of the word) attitude about such things.

  4. Ethan says:

    In fairness, Hans, I was not trying to write ads from scratch. I was trying to take a phrase that was 30 characters long and trying to create a similar phrase 25 characters long. I understand Spanish well enough to hold basic conversations, and can make sense of written French and Portuguese. So I figured with digital help, removing five characters might be possible. Not so…

  5. ed says:

    There were two themes at last years AMTA (machine translation geekfest)–the long stagnant world of MT is going to rapidly improve with the advent of: lots more data from the web and lots more compute power. Those of us who are relatively optimistic about possibilities of MT and human hybrid approaches see wonderful things for global information sharing in the near future. Inshallah!

  6. Ashley Rose says:

    I think you have raised a very valid point, that generally we tend to ignore the wealth of content and ideas in the world if its not written in our own language. Although clearly for someone like myself who has been ‘attempting’ Ilalian and Japanese for several years – the process of interpreting such content is slow and often with error. Then again with the assistance of tech-translators the possibilities are endless.

    Personally I would like to see translation ramped up for academic journals.

Comments are closed.