Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

USC Researchers Build Machine Translation System -- and More -- For Hindi in Less Than a Month

02.07.2003


In less than a month, researchers at USC’s Information Sciences Institute and collaborators nationwide have built one of the world’s best systems to translate Hindi text into English and query Hindi databases using English questions.

This effort was part of the "Surprise Language" project, a test of the computer science community’s ability to create translation tools quickly for previously unresearched languages sponsored by the Defense Advance Research Project Agency (DARPA). The exercise ended July 1.

"A month ago, we didn’t even know what language we would be working on," explained Ulrich Germann, a computational linguist at ISI, which is part of the USC School of Engineering.



Then, at 10:55 p.m. PDT on June 1, the manager for DARPA’s TIDES (Translingual Information Detection, Extraction, and Summarization) program fired the starting gun with an email: "Surprise Language is Hindi.... Good luck!"

Teams at 11 different sites across the US and one in the UK jumped into action, and twenty-nine days later can present an impressive array of information processing tools for Hindi.

"We succeeded in all aspects of the exercise," said Douglas W. Oard, an associate professor at the University of Maryland who is currently spending a sabbatical year at ISI. "A month ago, we had no information retrieval for Hindi, no machine translation, no named entity identification, no question answering. Now we have all of that."

ISI’s researchers focused on four aspects of cross-lingual information processing: resource building, machine translation, summarization, and providing an efficient interface for the human to navigate the information space. Of these,"clearly, machine translation is the pivotal technology in this scenario," said Germann.

ISI research scientist Franz Josef Och, a leading specialist in machine translation, did much of this key task for ISI.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained. "Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system collection of parallel texts, material in the foreign language and their translations into English. The system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

Och’s Hindi system was one of four developed independently during the exercise. Trials scheduled for coming weeks will rate his against those developed at other sites.

Finding and creating parallel texts for Och and his colleagues to analyze was a major effort during the exercise, said Germann. While for most European languages, there are one or two predominant standardized ways of encoding them, e.g."Latin-1" or Unicode, Hindi has a wildly mixed potpourri of encodings.

"It’s ridiculous," said Germann, "almost every single Hindi language web site has its own encoding." Tools had to be made to convert all of these various systems to a single common one to present parallel texts to Och and other machine translation experts.

"Most of the conversion work was done by our partners at other participating sites, and it was absolutely critical to the success of the exercise," Germann said.

In addition to Och’s translation work, researchers applied search, summarization, and visualization tools developed at ISI to make Hindi texts more accessible to English language speakers. ISI researchers Anton Leuski and Chin-Yew Lin collaborated on a super-Google-like mutli-document search, summarization, adn translation system that allows users to enter search terms in English and generate results grouped by similarities found in the text, using refinements on a multi-document summarization technique developed by Lin.

Graduate student Liang Zhou developed a way to generate a headline for each group of similar stories found. Leuski’s unique Lighthouse visualization system displayed these results at spheres floating in groupings on the screen, with the most similar closest together.

The bottom line: a user can then view individual documents, or automatically generated summaries for whole groups of documents. Even though all documents were originally in Hindi, all the added value is available in English, thanks to the machine translation engine. In addition, references to locations in the documents are spotted (using a third-party tool, the BBN IdentiFinder) in the text and plotted on a map.

"It’s just wonderful to see so many of the technologies that we have developed at ISI come together and interact in such a useful way," said Eduard Hovy, head of ISI’s Natural Language Group.

Eric Mankin | USC
Further information:
http://www.usc.edu/isinews/stories/98.html

More articles from Information Technology:

nachricht Open source software helps researchers extract key insights from huge sensor datasets
22.03.2019 | Universität des Saarlandes

nachricht Touchscreens go 3D with buttons that pulsate and vibrate under your fingertips
14.03.2019 | Universität des Saarlandes

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: The taming of the light screw

DESY and MPSD scientists create high-order harmonics from solids with controlled polarization states, taking advantage of both crystal symmetry and attosecond electronic dynamics. The newly demonstrated technique might find intriguing applications in petahertz electronics and for spectroscopic studies of novel quantum materials.

The nonlinear process of high-order harmonic generation (HHG) in gases is one of the cornerstones of attosecond science (an attosecond is a billionth of a...

Im Focus: Magnetic micro-boats

Nano- and microtechnology are promising candidates not only for medical applications such as drug delivery but also for the creation of little robots or flexible integrated sensors. Scientists from the Max Planck Institute for Polymer Research (MPI-P) have created magnetic microparticles, with a newly developed method, that could pave the way for building micro-motors or guiding drugs in the human body to a target, like a tumor. The preparation of such structures as well as their remote-control can be regulated using magnetic fields and therefore can find application in an array of domains.

The magnetic properties of a material control how this material responds to the presence of a magnetic field. Iron oxide is the main component of rust but also...

Im Focus: Self-healing coating made of corn starch makes small scratches disappear through heat

Due to the special arrangement of its molecules, a new coating made of corn starch is able to repair small scratches by itself through heat: The cross-linking via ring-shaped molecules makes the material mobile, so that it compensates for the scratches and these disappear again.

Superficial micro-scratches on the car body or on other high-gloss surfaces are harmless, but annoying. Especially in the luxury segment such surfaces are...

Im Focus: Stellar cartography

The Potsdam Echelle Polarimetric and Spectroscopic Instrument (PEPSI) at the Large Binocular Telescope (LBT) in Arizona released its first image of the surface magnetic field of another star. In a paper in the European journal Astronomy & Astrophysics, the PEPSI team presents a Zeeman- Doppler-Image of the surface of the magnetically active star II Pegasi.

A special technique allows astronomers to resolve the surfaces of faraway stars. Those are otherwise only seen as point sources, even in the largest telescopes...

Im Focus: Heading towards a tsunami of light

Researchers at Chalmers University of Technology and the University of Gothenburg, Sweden, have proposed a way to create a completely new source of radiation. Ultra-intense light pulses consist of the motion of a single wave and can be described as a tsunami of light. The strong wave can be used to study interactions between matter and light in a unique way. Their research is now published in the scientific journal Physical Review Letters.

"This source of radiation lets us look at reality through a new angle - it is like twisting a mirror and discovering something completely different," says...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

International Modelica Conference with 330 visitors from 21 countries at OTH Regensburg

11.03.2019 | Event News

Selection Completed: 580 Young Scientists from 88 Countries at the Lindau Nobel Laureate Meeting

01.03.2019 | Event News

LightMAT 2019 – 3rd International Conference on Light Materials – Science and Technology

28.02.2019 | Event News

 
Latest News

Solving the efficiency of Gram-negative bacteria

22.03.2019 | Life Sciences

Bacteria bide their time when antibiotics attack

22.03.2019 | Life Sciences

Open source software helps researchers extract key insights from huge sensor datasets

22.03.2019 | Information Technology

VideoLinks
Science & Research
Overview of more VideoLinks >>>