New tool for building thesauruses

Tesaurvai can extract, annotate and organize specialized terms taken from a collection of digitalized texts. Tesaurvai complies with the ISO thesaurus building standard and was developed by the VAI in conjunction with the Spanish National Research Council’s Institute of Documentary Studies on Science and Technology (formerly CINDOC).

Euralex is Europe’s most influential lexicographical congress. The InfoLex research group, based at the Universidad Pompeu Fabra’s College of Applied Linguistics is organizing the 2008 event, which will bring together professional lexicographers, publishers, researchers, specialists and anyone with an interest in dictionaries of any kind.

2 in 1

Tesaurvai’s key innovation is that it combines a terminology extractor capable of ordering and selecting from 1- to 10-word terms with ISO standard-compliant thesaurus building capabilities in the same tool. The extractor identifies the terms located in digital texts that are to be transferred to the thesaurus builder. The thesaurus is a systematized list of domain-representative terms.

Tesaurvai conforms to international thesaurus building and management standards and has several implementations. First, the tool can build thesauruses from scratch, through information extraction to term creation, edition and annotation. It is easy to use to establish relationships between terms and run basic and advanced word searches. Second, the Tesaurvai tool can import and export text thesauruses to XML files. Finally, it can build alphabetical and systematized indices, which can be exchanged for printing or exportation as reports.

Available as of 2008

The tool has been developed in Java and works on a database. Tesaurvai is compatible with any database manager equipped with Java Database (JDBC) connectivity.

It was developed as part of the “Cultural heritage document search based on multilingual technical resources” (Patrilex) project, supported by the Ministry of Education with the aim of generating a methodology and tools for building multilingual lexical resources.

Tesaurvai is now undergoing massive testing. As of July 2008 it will be available to any Internet user.

Media Contact

Eduardo Martínez alfa

All latest news from the category: Information Technology

Here you can find a summary of innovations in the fields of information and data processing and up-to-date developments on IT equipment and hardware.

This area covers topics such as IT services, IT architectures, IT management and telecommunications.

Back to home

Comments (0)

Write a comment

Newest articles

A universal framework for spatial biology

SpatialData is a freely accessible tool to unify and integrate data from different omics technologies accounting for spatial information, which can provide holistic insights into health and disease. Biological processes…

How complex biological processes arise

A $20 million grant from the U.S. National Science Foundation (NSF) will support the establishment and operation of the National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) at…

Airborne single-photon lidar system achieves high-resolution 3D imaging

Compact, low-power system opens doors for photon-efficient drone and satellite-based environmental monitoring and mapping. Researchers have developed a compact and lightweight single-photon airborne lidar system that can acquire high-resolution 3D…

Partners & Sponsors