Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Grammar Lost Translation Machine In Researchers Fix Will

12.09.2005


The makers of a University of Southern California computer translation system consistently rated among the world’s best are teaching their software something new: English grammar.



Most modern "machine translation" systems, including the highly rated one created by USC’s Information Sciences Institute, rely on brute force correlation of vast bodies of pre-translated text from such sources as newspapers that publish in multiple languages.
Software matches up phrases that consistently show up in parallel fashion — the English "my brother’s pants" and Spanish "los pantalones de mi hermano," — and then use these matches to piece together translations of new material.

It works — but only to a point. ISI machine translation expert Daniel Marcu (left) says that when such a system is "trained on enough relevant bilingual text ... it can break a foreign language up into phrasal units, translate each of them fairly well into English, and do some re-ordering. However, even in this good scenario, the output is still clearly not English. It takes too long to read, and it is unsatisfactory for commercial use."



So Marcu and colleague Kevin Knight (right), both ISI project leaders who also hold appointments in the USC Viterbi School of Engineering department of computer science, have begun an intensive $285,000 effort, called the Advanced Language Modeling for Machine Translation project, to improve the system they created at ISI by subjecting the texts that come out of their translation engine to a follow-on step: grammatical processing.

The step seems simple, but is actually imposingly difficult. "For example, there is no robust algorithm that returns ’grammatical’ or ’ungrammatical’ or ’sensible’ or ’nonsense’ in response to a user-typed sequence of words," Marcu notes.

The problem grows out of a natural language feature noted by M.I.T. language theorist Noam Chomsky decades ago. Language users have literally a limitless ability to nest and cross-nest phrases and ideas into intricate referential structures — "I was looking for the stirrups from the saddle that my ex-wife’s oldest daughter took with her when she went to Jack’s new place in Colorado three years ago, but all she had were Louise’s second-hand saddle shoes, the ones Ethel’s dog chewed during the fire."

Unraveling these verbal cobwebs (or, in the more common description, tracing branching "trees" of connections) is such a daunting task that programmers long ago went in the brute force direction of matching phrases and hoping that the relation of the phrases would become clear to readers.

With the limits of this approach becoming clear, researchers have now begun applying computing power to trying to assemble grammatical rules. According to Knight, one crucial step has been the creation of a large database of English text whose syntax has been hand-decoded by humans, the "Penn Treebank."

Using this and other sources, computer scientists have begun developing ways to model the observed rules. A preliminary study by Knight and two colleagues in 2003 showed that this approach might be able to improve translations.

Accordingly, for their study, "We propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI’s state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree- based language model, and we plan for better translations to rise to the top of the list."

One crucial trick that the system must be able to do is to pick out separate trees from the endless strings of words. But this is doable, Knight believes -- and in the short, not the long term.

Referring to the annual review of translation systems by the National Institute of Science and Technology, in which ISI consistently gains top scores, "we want to have the grammar module installed and working by the next evaluation, in August 2006," he said.

Knight and Marcu are cofounders and, respectively, chief scientist and chief technology and operating officer of a spinoff company, Language Weaver.

Eric Mankin | EurekAlert!
Further information:
http://www.usc.edu

More articles from Information Technology:

nachricht Terahertz spectroscopy goes nano
20.10.2017 | Brown University

nachricht New software speeds origami structure designs
12.10.2017 | Georgia Institute of Technology

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Neutron star merger directly observed for the first time

University of Maryland researchers contribute to historic detection of gravitational waves and light created by event

On August 17, 2017, at 12:41:04 UTC, scientists made the first direct observation of a merger between two neutron stars--the dense, collapsed cores that remain...

Im Focus: Breaking: the first light from two neutron stars merging

Seven new papers describe the first-ever detection of light from a gravitational wave source. The event, caused by two neutron stars colliding and merging together, was dubbed GW170817 because it sent ripples through space-time that reached Earth on 2017 August 17. Around the world, hundreds of excited astronomers mobilized quickly and were able to observe the event using numerous telescopes, providing a wealth of new data.

Previous detections of gravitational waves have all involved the merger of two black holes, a feat that won the 2017 Nobel Prize in Physics earlier this month....

Im Focus: Smart sensors for efficient processes

Material defects in end products can quickly result in failures in many areas of industry, and have a massive impact on the safe use of their products. This is why, in the field of quality assurance, intelligent, nondestructive sensor systems play a key role. They allow testing components and parts in a rapid and cost-efficient manner without destroying the actual product or changing its surface. Experts from the Fraunhofer IZFP in Saarbrücken will be presenting two exhibits at the Blechexpo in Stuttgart from 7–10 November 2017 that allow fast, reliable, and automated characterization of materials and detection of defects (Hall 5, Booth 5306).

When quality testing uses time-consuming destructive test methods, it can result in enormous costs due to damaging or destroying the products. And given that...

Im Focus: Cold molecules on collision course

Using a new cooling technique MPQ scientists succeed at observing collisions in a dense beam of cold and slow dipolar molecules.

How do chemical reactions proceed at extremely low temperatures? The answer requires the investigation of molecular samples that are cold, dense, and slow at...

Im Focus: Shrinking the proton again!

Scientists from the Max Planck Institute of Quantum Optics, using high precision laser spectroscopy of atomic hydrogen, confirm the surprisingly small value of the proton radius determined from muonic hydrogen.

It was one of the breakthroughs of the year 2010: Laser spectroscopy of muonic hydrogen resulted in a value for the proton charge radius that was significantly...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

ASEAN Member States discuss the future role of renewable energy

17.10.2017 | Event News

World Health Summit 2017: International experts set the course for the future of Global Health

10.10.2017 | Event News

Climate Engineering Conference 2017 Opens in Berlin

10.10.2017 | Event News

 
Latest News

Terahertz spectroscopy goes nano

20.10.2017 | Information Technology

Strange but true: Turning a material upside down can sometimes make it softer

20.10.2017 | Materials Sciences

NRL clarifies valley polarization for electronic and optoelectronic technologies

20.10.2017 | Interdisciplinary Research

VideoLinks
B2B-VideoLinks
More VideoLinks >>>