Friday, 2 September 2011

A Spaniard teaches Google how to listen

NEW YORK (El Pais / by Rosa Jimenez Cano) How does Google know what we are asking it to search for? Why does it have more difficulty in recognizing Vietnamese than it does Zulu? Before asking Google, a better place to start might be Pedro Moreno. The 47-year-old Spanish-born engineer works for the company at its New York offices, where he spends his time as part of a team of 40 people improving voice-recognition systems that will allow users to access Google verbally from their cellphones. The voice-recognition technology being developed by Moreno and his colleagues is also used by YouTube to transcribe the spoken words in videos into text.>>>

Google began working on voice-recognition technology six years ago, says Moreno. "We still make mistakes, and we will continue to - languages are incredibly complex and subject to the tiniest of changes," he says, adding that internet users have made a huge contribution to the task.
"We hire people to test the system, but it is the comments and contributions that users make which has really made a difference." He believes that in the near future search engines like Google "will adapt to an individual's accent; it is an intuitive program that learns as it goes along."
The process began with English, and was then adapted for the world's second language, Chinese. "Demographically, it was the logical way forward," says Moreno, "and at that time, Google was very interested in moving into the Chinese market."
The task proved immensely complicated. Up until then, Google has used two combined systems: one based on words, and another on grammar. This allowed for the development of a system that recognized the sound of words. But in the case of Chinese, the task was made more difficult by the subtle variations in tone that give the same word different meanings; in other words, a system was needed that could distinguish intonation rather than a succession of sounds. Because of this difficulty, Google invented a system that recognizes more than 30 languages. It now has a program that can be installed on Android phones that allows users to search by voice.
Moreno says that it now takes his team around two weeks to develop a voice recognition program for any language, down from the two months it initially took.
He says the next major linguistic area that Google will be developing is the Arab-speaking world. "Following the uprisings there, we believe it is simply a matter of common sense to provide people with better search tools. The first trials are being finished now," he says.
Google's aim is to develop voice-recognition programs for the world's 300 main languages. "Any language that is spoken by more than a million people," says Moreno, "but we would like to continue developing the program beyond that. We want people to be able to talk to their cellphones as simply as possible." The team has even developed a program for Latin, which, rather than facilitating access to the internet for the Vatican, made it easier to work on German, Russian, and the Scandinavian languages.
Moreno says that each new linguistic group has its own peculiarities. In the case of Xhosa and other Southern African languages, the team had to come up with a way to recognize its distinctive clicks. "To our western ears, those clicks are very difficult to distinguish," he says, "but working on the program allowed us to improve our recognition of acoustic modulations, and apply it to Afrikaans, for example."
The most difficult task so far has been developing a system that recognizes Hong Kong Chinese, where around 20 percent of words are in English, along with different Chinese languages and other local influences. Moreno says that if Google can crack that problem, the world is its oyster.

No comments: