By Sue Vorenberg Santa Fe New Mexican
05/11/08 4:00 AM PT
It takes lots of patients and millions of examples to teach a computer a new language. It's actually easier to program such tasks as landing on the moon than to help a computer figure out the meanings of words and how they translate into other languages, said Peter Norvig, Google's director of research.
Be a Rockstar to Your Marketing Department These days, IT staffers work to fulfill a lot of requests. Like finding an email marketing solution for your marketing department. Lyris ListManager is the robust, scalable, and easily integrated solution your team needs. Download your free trial version today.
Teaching a computer to understand languages isn't rocket science -- it's not nearly that easy, said Peter Norvig, director of research at Google (Nasdaq: GOOG) .
It takes a limited number of calculations to send a spacecraft to the moon, Mars or other planets. And while the calculations aren't so simple, they are fairly easily managed by a computer, he said.
But learning what words mean, how they fit together and how they translate into other languages is much more challenging, he said.
Rules and Exceptions
"In physics, we've been able to use computers very well for a long time. We can get our spacecraft to the moon or Mars very accurately," Norvig said. "But part of the problem with language is there's lots and lots of rules, and there are lots and lots of exceptions to those rules."
Rather than using grammar, about two years ago Google started to take a different approach to teach a computer how to understand languages, which is more like the way humans learn them, he said.
Every Word Counts
What the strategy comes down to is programming the computer to learn through examples. By exposing it to an abundance of texts in a specific language, it can learn to pick out patterns, Norvig said.
And if you teach it to compare two different languages side by side, it can figure out which words or characters generally correspond to one another.
"Most of the answer to how you do this is counting -- it's just the fancy phrase for counting is 'probability theory,'" Norvig said.
What Google's language tools do, for example, is let you do a word or phrase search in English. Then it will find results for that search among Web sites written in Spanish. And it will translate them so the English-language user can sort through those links in English.
Building a Collection
So far, it works with about 15 languages, but the hope is to add more soon, he said.
The tools also let you translate Web pages and text, among other things.
The key to building the language tools program was to feed it lots and lots of texts, gathering them from groups that already have documents translated into several languages, such as international news sites and United Nations archives, Norvig said.
"Then we build a model that says, 'Here's all these translations, and we know this page is a translation of that page, but we don't know exactly which corresponds to which,'" Norvig said. "What we have, though, is probabilities. Like the first sentence in English is similar to the first sentence in Chinese, but it could be the first two sentences, the first three, or it could be one to one."
After one example, the computer is still confused. But after a million examples, it starts to make associations that make sense, he said.
For instance, a Chinese character may come up often in relation to the English word "dog" or "terrier." And from that the computer learns to make a connection, he said.
"We've been able to do this, and our translation software is usually right at the top of a search," Norvig said. "And we've even been able to do this in some languages where nobody on the team speaks the language."
Not Perfect
The resulting translations aren't perfect, but they do get the general point across, he added.
"They come out understandable, but you don't go more than three or four sentences before you realize this was not written by a native speaker," Norvig said.
Still, the more examples it gets, the better it translates, although Norvig said he suspects there's a ceiling to how well it will work. But that ceiling is still pretty far away, he said.
Google is also working on a similar method to sort through images.
Vision of the Future
Right now, image search programs generally just look at words around images.
But the Google program will collect some features of images, like horizontal or vertical lines that might be similar in a million pictures that come up when somebody searches for an image of dolphins, Norvig said.
"It collects a range, then looks at which pictures are nearest to the center of that range," Norvig said. "And in a search we try to bring up the center of that range first."
The image programming is still a work in progress. That part is even harder than language, Norvig said.
"The vision stuff is not quite there yet," he said, adding with more examples, perhaps accurate image searches won't be too far behind his company's language translation program.