A translator tool with a human touch sign in to recommend

December 1, 2009, 2:09pm

How hard can it be, as the joke goes, to speak Chinese? (Six-year-olds do it all the time.)

Yes, it turns out that learning languages is one of those skills that humans, even relatively young ones, master seemingly magically. It is all enough to make a mainframe computer jealous.

At I.B.M., a team of nearly 100, including mathematicians and software developers, is working on a project to create an automatic translation tool, so-called machine translation, that has the speed and accuracy to be used in instant-messaging between speakers of two different languages.

The project, called n.Fluent, is intended to teach the computer terminology that is specific to I.B.M.’s businesses, and, more significantly, allow the computer to learn what it has been doing wrong. To that end, the company is extracting and organizing contributions from I.B.M.’s 400,000-member work force spread across more than 170 countries, adding a human touch to the project.

Over a two-week period last month, the company issued a “worldwide translation challenge” to its employees, using a points-based system to award the biggest contributors prizes that were converted to charitable donations. About 6,000 I.B.M. employees made improvements in 11 languages to more than two million words of text translated by n.Fluent.

So, when a machine translation from French produces, “MTTP is the time of 30 minutes and it is steadily declining since January 2006,” a human correction comes up with this improved English version: “The MTTP delay is 30 minutes and it has been steadily declining since January 2006.”

“From this parallel data, we update the models,” said Salim Roukos, an I.B.M. researcher in language-related technology at its T.J. Watson Laboratory in Yorktown Heights, N.Y., home of the n.Fluent project. “You want to learn the idiomatic expressions — when you say someone has kicked the bucket, you don’t want that translated word for word.”

So far, n.Fluent is used only by I.B.M. employees, but the intention is to create a product that can be sold to other businesses.

Efforts like this at I.B.M., as well as social networking tools behind the company’s firewalls, amount to a new twist on “crowdsourcing,” the term I.B.M. officials use to describe them. In addition to the n.Fluent project, I.B.M. has its own companywide version of Wikipedia (Bluepedia), with contributions from 1,300 employees.

Perhaps the most innovative social networking experiment at I.B.M., according to Irene Greif of the I.B.M. Center for Social Software in Cambridge, Mass., is Dogear, a tool similar to Delicious that allows employees to share links and tagging on the Internet as well as on the I.B.M.-only intranet. The project itself was a bit of an experiment, and I.B.M. developers tweaked further, she said.

This led to Dogear, a system of tags and descriptions contributed by 10 percent of users. It has become more popular than I.B.M.’s own internal search engine.

“A small crowd, a self-selected crowd can often be useful,” Ms. Greif said.

This highlights the differences between what is occurring at I.B.M. and other large companies and what traditionally constitutes crowdsourcing.

I.B.M. employees are not just any “crowd”; they have expertise and a loyalty to their employer that any old posse wrangled up on the Internet may not. In fact, crowdsourcing may be the wrong way of thinking of such internal corporate projects. Employee-sourcing?

Maybe that catch-all term “collaboration” is the best way to think of what social networking technology can bring to the workplace.

After all, collaboration is an old goal for employees and employers.

In the case of the n.Fluent project, programmers are not trying to have a computer master the “rules” of a language, but rather are looking for statistical patterns between two sets of translated texts and among the words themselves. For example, Roukos said, the text of a Canadian parliamentary debate in French and English can help programmers to “build statistical models based on the parallel corpus.” (NYT)

Comments