Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

First multilingual dictionary based on universal words

22.11.2007
Researchers at the Universidad Politécnica de Madrid’s School of Computing have developed an original system for building multilingual dictionaries based on multiple term equivalences from what are known as universal words. System reliability and accuracy is 88%.

The system is based on Princeton University’s WordNet database. WordNet is a lexical database developed by linguists at Princeton’s Cognitive Science Laboratory. The database was designed to inventory, classify and relate the semantic and lexical content of the English language.

WordNet is packaged as an electronic database that can be downloaded over the Internet. WordNet’s underlying foundation is synset (synonym set), a group of interchangeable words that denote a meaning or particular usage. Each synset is one possible meaning of a word, described briefly and concisely. WordNet has a lexicon of over 200,000 perfectly structured and defined English terms. This is one of the pillars of the system conceived by researchers at the UPM’s School of Computing.

The system’s other mainstay are universal words. The concept of universal word came out of the UNL (Universal Networking Language) Project. The aim of this project is to eliminate the barriers of linguistic diversity by creating a medium of information exchange through which users can communicate in their own language.

Universal words

As the UNL Project’s Spanish Language Centre explains, one of the key concepts of UNL is the universal word. A universal word is a word, taken from the English language, to which a number of attributes and constraints are added to disambiguate the term.

The English term plus the attributes and constraints is known as a universal word, because it has an equivalent term in any other language. On account of their accuracy, one of the uses of universal words is to systematically produce multilingual dictionaries.

Researchers at the UPM’s School of Computing have applied an algorithm based on computational models to the WordNet database and expanded the English language lexicons. The constructed universal words can then be used to compile multilingual dictionaries.

UPM engineers created a Universal Words Dictionary. This dictionary can associate the words of each language with the respective disambiguated universal word. These researchers have managed to develop a tool that people can use to enter a word in their original language and select the equivalent of a set of terms written in their own language in another language. This is a breakthrough for multilingualism.

How does it work?

The ultimate aim is to build extremely precise multilingual dictionaries. The system contains universal words in English taken from the WordNet database. The universal words are passed on to lexicographers from different countries. Each lexicographer reads the universal word in English and understands its given meaning. The lexicographers then add the translation of the term in their mother tongue. To do all this, they do not have to be acquainted with the target language.

Even though the universal words are in English, it is not the same as using English language words, as English is just another natural language liable to ambiguity. The addition of attributes and constraints removes all ambiguity, and the equivalence between languages is extremely good.

This is how this multilingual dictionary is being built. The method has already been tried and tested with striking results. The translations from the universal words created using WordNet are 88% accurate and reliable.

Original system

Compared to other lexicographical methods, this is an original system, as it can generate bilingual dictionaries without experts having to speak both languages. All they need, apart from their mother tongue, is a good enough level of English to enter the exact translation of the word they are looking for.

Whereas there are plenty of Spanish-English interpreters, for example, it is harder to find Portuguese-Bulgarian translators, a problem that this system developed by researchers at the UPM obviates.

These bilingual dictionaries based on multiple equivalences of terms are not only useful for building dictionaries but also for supporting search systems in different languages.

Spanish cultural heritage multilingual dictionary

The Validation and Business Applications Group, led by Jesús Cardeñosa, a professor at the School of Computing, is using this system to compile a dictionary of multilingual terms on Spain’s cultural heritage, commissioned by the Ministry of Culture under the Patrilex Project. The project is to be completed by the end of 2008.

The goal of this project is to define a methodology and develop tools that support cultural heritage document search based on multilingual lexical resources. To do this, researchers are developing tools to manage lexical resources about Spain’s cultural heritage. The key tool is a multilingual thesaurus (database).

A thesaurus is a list of terms, possibly composed of more than one word, related hierarchically to each other (general terms and subordinate terms) and used to index and retrieve documents. The thesaurus will be the core for defining semantic relations to establish the underlying context of a query.

The final result will be a search system based on user keyword entries, capable of putting the query into context and establishing a correspondence to equivalent words in other languages. The system will then be able to return documents in several languages that fit the search terms entered in Spanish with a precision unparalleled by current multilingual systems.

The languages used to build the multilingual thesaurus will be Spanish, English and Russian, and the system’s real test-bed will be the Under-Directorate General of Historical Heritage Conservation’s website, which is now exclusively in Spanish. According to the project brief, the methodology will put the accent on the method’s extensibility to other languages.

Eduardo Martínez | alfa
Further information:
http://www.fi.upm.es/?pagina=543

More articles from Information Technology:

nachricht Smarter robot vacuum cleaners for automated office cleaning
15.08.2017 | Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO

nachricht Researchers 3-D print first truly microfluidic 'lab on a chipl devices
15.08.2017 | Brigham Young University

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Fizzy soda water could be key to clean manufacture of flat wonder material: Graphene

Whether you call it effervescent, fizzy, or sparkling, carbonated water is making a comeback as a beverage. Aside from quenching thirst, researchers at the University of Illinois at Urbana-Champaign have discovered a new use for these "bubbly" concoctions that will have major impact on the manufacturer of the world's thinnest, flattest, and one most useful materials -- graphene.

As graphene's popularity grows as an advanced "wonder" material, the speed and quality at which it can be manufactured will be paramount. With that in mind,...

Im Focus: Exotic quantum states made from light: Physicists create optical “wells” for a super-photon

Physicists at the University of Bonn have managed to create optical hollows and more complex patterns into which the light of a Bose-Einstein condensate flows. The creation of such highly low-loss structures for light is a prerequisite for complex light circuits, such as for quantum information processing for a new generation of computers. The researchers are now presenting their results in the journal Nature Photonics.

Light particles (photons) occur as tiny, indivisible portions. Many thousands of these light portions can be merged to form a single super-photon if they are...

Im Focus: Circular RNA linked to brain function

For the first time, scientists have shown that circular RNA is linked to brain function. When a RNA molecule called Cdr1as was deleted from the genome of mice, the animals had problems filtering out unnecessary information – like patients suffering from neuropsychiatric disorders.

While hundreds of circular RNAs (circRNAs) are abundant in mammalian brains, one big question has remained unanswered: What are they actually good for? In the...

Im Focus: RAVAN CubeSat measures Earth's outgoing energy

An experimental small satellite has successfully collected and delivered data on a key measurement for predicting changes in Earth's climate.

The Radiometer Assessment using Vertically Aligned Nanotubes (RAVAN) CubeSat was launched into low-Earth orbit on Nov. 11, 2016, in order to test new...

Im Focus: Scientists shine new light on the “other high temperature superconductor”

A study led by scientists of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD) at the Center for Free-Electron Laser Science in Hamburg presents evidence of the coexistence of superconductivity and “charge-density-waves” in compounds of the poorly-studied family of bismuthates. This observation opens up new perspectives for a deeper understanding of the phenomenon of high-temperature superconductivity, a topic which is at the core of condensed matter research since more than 30 years. The paper by Nicoletti et al has been published in the PNAS.

Since the beginning of the 20th century, superconductivity had been observed in some metals at temperatures only a few degrees above the absolute zero (minus...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Call for Papers – ICNFT 2018, 5th International Conference on New Forming Technology

16.08.2017 | Event News

Sustainability is the business model of tomorrow

04.08.2017 | Event News

Clash of Realities 2017: Registration now open. International Conference at TH Köln

26.07.2017 | Event News

 
Latest News

Gold shines through properties of nano biosensors

17.08.2017 | Physics and Astronomy

Greenland ice flow likely to speed up: New data assert glaciers move over sediment, which gets more slippery as it gets wetter

17.08.2017 | Earth Sciences

Mars 2020 mission to use smart methods to seek signs of past life

17.08.2017 | Physics and Astronomy

VideoLinks
B2B-VideoLinks
More VideoLinks >>>