Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

USC Researchers Build Machine Translation System -- and More -- For Hindi in Less Than a Month

02.07.2003


In less than a month, researchers at USC’s Information Sciences Institute and collaborators nationwide have built one of the world’s best systems to translate Hindi text into English and query Hindi databases using English questions.

This effort was part of the "Surprise Language" project, a test of the computer science community’s ability to create translation tools quickly for previously unresearched languages sponsored by the Defense Advance Research Project Agency (DARPA). The exercise ended July 1.

"A month ago, we didn’t even know what language we would be working on," explained Ulrich Germann, a computational linguist at ISI, which is part of the USC School of Engineering.



Then, at 10:55 p.m. PDT on June 1, the manager for DARPA’s TIDES (Translingual Information Detection, Extraction, and Summarization) program fired the starting gun with an email: "Surprise Language is Hindi.... Good luck!"

Teams at 11 different sites across the US and one in the UK jumped into action, and twenty-nine days later can present an impressive array of information processing tools for Hindi.

"We succeeded in all aspects of the exercise," said Douglas W. Oard, an associate professor at the University of Maryland who is currently spending a sabbatical year at ISI. "A month ago, we had no information retrieval for Hindi, no machine translation, no named entity identification, no question answering. Now we have all of that."

ISI’s researchers focused on four aspects of cross-lingual information processing: resource building, machine translation, summarization, and providing an efficient interface for the human to navigate the information space. Of these,"clearly, machine translation is the pivotal technology in this scenario," said Germann.

ISI research scientist Franz Josef Och, a leading specialist in machine translation, did much of this key task for ISI.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained. "Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system collection of parallel texts, material in the foreign language and their translations into English. The system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

Och’s Hindi system was one of four developed independently during the exercise. Trials scheduled for coming weeks will rate his against those developed at other sites.

Finding and creating parallel texts for Och and his colleagues to analyze was a major effort during the exercise, said Germann. While for most European languages, there are one or two predominant standardized ways of encoding them, e.g."Latin-1" or Unicode, Hindi has a wildly mixed potpourri of encodings.

"It’s ridiculous," said Germann, "almost every single Hindi language web site has its own encoding." Tools had to be made to convert all of these various systems to a single common one to present parallel texts to Och and other machine translation experts.

"Most of the conversion work was done by our partners at other participating sites, and it was absolutely critical to the success of the exercise," Germann said.

In addition to Och’s translation work, researchers applied search, summarization, and visualization tools developed at ISI to make Hindi texts more accessible to English language speakers. ISI researchers Anton Leuski and Chin-Yew Lin collaborated on a super-Google-like mutli-document search, summarization, adn translation system that allows users to enter search terms in English and generate results grouped by similarities found in the text, using refinements on a multi-document summarization technique developed by Lin.

Graduate student Liang Zhou developed a way to generate a headline for each group of similar stories found. Leuski’s unique Lighthouse visualization system displayed these results at spheres floating in groupings on the screen, with the most similar closest together.

The bottom line: a user can then view individual documents, or automatically generated summaries for whole groups of documents. Even though all documents were originally in Hindi, all the added value is available in English, thanks to the machine translation engine. In addition, references to locations in the documents are spotted (using a third-party tool, the BBN IdentiFinder) in the text and plotted on a map.

"It’s just wonderful to see so many of the technologies that we have developed at ISI come together and interact in such a useful way," said Eduard Hovy, head of ISI’s Natural Language Group.

Eric Mankin | USC
Further information:
http://www.usc.edu/isinews/stories/98.html

More articles from Information Technology:

nachricht Smart Computers
21.08.2017 | Albert-Ludwigs-Universität Freiburg im Breisgau

nachricht AI implications: Engineer's model lays groundwork for machine-learning device
18.08.2017 | Washington University in St. Louis

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Fizzy soda water could be key to clean manufacture of flat wonder material: Graphene

Whether you call it effervescent, fizzy, or sparkling, carbonated water is making a comeback as a beverage. Aside from quenching thirst, researchers at the University of Illinois at Urbana-Champaign have discovered a new use for these "bubbly" concoctions that will have major impact on the manufacturer of the world's thinnest, flattest, and one most useful materials -- graphene.

As graphene's popularity grows as an advanced "wonder" material, the speed and quality at which it can be manufactured will be paramount. With that in mind,...

Im Focus: Exotic quantum states made from light: Physicists create optical “wells” for a super-photon

Physicists at the University of Bonn have managed to create optical hollows and more complex patterns into which the light of a Bose-Einstein condensate flows. The creation of such highly low-loss structures for light is a prerequisite for complex light circuits, such as for quantum information processing for a new generation of computers. The researchers are now presenting their results in the journal Nature Photonics.

Light particles (photons) occur as tiny, indivisible portions. Many thousands of these light portions can be merged to form a single super-photon if they are...

Im Focus: Circular RNA linked to brain function

For the first time, scientists have shown that circular RNA is linked to brain function. When a RNA molecule called Cdr1as was deleted from the genome of mice, the animals had problems filtering out unnecessary information – like patients suffering from neuropsychiatric disorders.

While hundreds of circular RNAs (circRNAs) are abundant in mammalian brains, one big question has remained unanswered: What are they actually good for? In the...

Im Focus: RAVAN CubeSat measures Earth's outgoing energy

An experimental small satellite has successfully collected and delivered data on a key measurement for predicting changes in Earth's climate.

The Radiometer Assessment using Vertically Aligned Nanotubes (RAVAN) CubeSat was launched into low-Earth orbit on Nov. 11, 2016, in order to test new...

Im Focus: Scientists shine new light on the “other high temperature superconductor”

A study led by scientists of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD) at the Center for Free-Electron Laser Science in Hamburg presents evidence of the coexistence of superconductivity and “charge-density-waves” in compounds of the poorly-studied family of bismuthates. This observation opens up new perspectives for a deeper understanding of the phenomenon of high-temperature superconductivity, a topic which is at the core of condensed matter research since more than 30 years. The paper by Nicoletti et al has been published in the PNAS.

Since the beginning of the 20th century, superconductivity had been observed in some metals at temperatures only a few degrees above the absolute zero (minus...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Call for Papers – ICNFT 2018, 5th International Conference on New Forming Technology

16.08.2017 | Event News

Sustainability is the business model of tomorrow

04.08.2017 | Event News

Clash of Realities 2017: Registration now open. International Conference at TH Köln

26.07.2017 | Event News

 
Latest News

Cholesterol-lowering drugs may fight infectious disease

22.08.2017 | Health and Medicine

Meter-sized single-crystal graphene growth becomes possible

22.08.2017 | Materials Sciences

Repairing damaged hearts with self-healing heart cells

22.08.2017 | Life Sciences

VideoLinks
B2B-VideoLinks
More VideoLinks >>>