Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

New method to facilitate extraction

18.12.2008
With today's large flows of data-based texts it is important to produce systems that facilitate searches for the particular information that is required.

Information on, for example, events in a company from news texts; who is leaving which post, why, to which company and position the person is moving etc. In his thesis Fredrik Olsson deals with a new method of facilitating the marking up of occurrences of names in data-based textual documents.

Information extraction entails analysing texts with the aim of identifying and picking out information about predefined types of entities, events in which the entities are engaged and relationships between entities and events. In other words it is about gaining access to structured information from an apparently unstructured source of information.

One of the reasons that information extraction is not available for everyone is that it requires a lot of work and time to adapt a system to function for new data in a new text domain. A system that could handle the scenario used as an example above would probably not function at all if the data were changed to identifying interactions between proteins described in biomedical text.

An established way of approaching the problem of domain adaptation of systems for information extraction is to realise its components using machine learning, i.e. computer programs that can learn. In many respects machine learning is based on there being examples from which to learn. A component in an extraction system needs to see examples of the phenomenon it is going to learn to identify, e.g. entities and the relationships between them. The basis of this type of machine learning is thus access to large quantities of examples. However, there are major challenges in producing good examples: it is laborious, takes time and requires a person who knows the domain well to mark up examples in texts.

Recognising names of, for example individuals, companies and locations is fundamental for information extraction. By recognising names we can also start to look for, for example, relationships, expressed in the text, between the bearers of the names.

In his thesis Fredrik Olsson describes the work of developing and evaluating a method, called BootMark, of marking up the occurrence of names in textual documents. BootMark contributes to reducing the quantity of documents that a human annotator needs to mark up in order to train a name recognizer with a performance that is equally good or better than a name recognizer who is trained in a random selection of documents from the same corpus.

Title of the thesis: Bootstrapping Named Entity Annotation by Means of Active Machine Learning. A method for creating corpora.
The thesis will be public defended on Friday 19 December at 1.15 pm
Location: Lilla hörsalen, Humanisten, Renströmsgatan 6
For further information contact Fredrik Olsson, mobile: +46 (0)704 -15 54 10,
e-mail: fredriko@sics.se
Contact person: Barbro Ryder Liljegren Faculty of Arts, University of Gothenburg Tel. +46 (0)31-786 48 65, e-mail: barbro.ryder@hum.gu.se

Eva Lundgren | idw
Further information:
http://www.vr.se

More articles from Communications Media:

nachricht New Technologies for A/V Analysis and Search
13.04.2017 | Fraunhofer-Institut für Digitale Medientechnologie IDMT

nachricht On patrol in social networks
25.01.2017 | Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO

All articles from Communications Media >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Frictional Heat Powers Hydrothermal Activity on Enceladus

Computer simulation shows how the icy moon heats water in a porous rock core

Heat from the friction of rocks caused by tidal forces could be the “engine” for the hydrothermal activity on Saturn's moon Enceladus. This presupposes that...

Im Focus: Nanoparticles help with malaria diagnosis – new rapid test in development

The WHO reports an estimated 429,000 malaria deaths each year. The disease mostly affects tropical and subtropical regions and in particular the African continent. The Fraunhofer Institute for Silicate Research ISC teamed up with the Fraunhofer Institute for Molecular Biology and Applied Ecology IME and the Institute of Tropical Medicine at the University of Tübingen for a new test method to detect malaria parasites in blood. The idea of the research project “NanoFRET” is to develop a highly sensitive and reliable rapid diagnostic test so that patient treatment can begin as early as possible.

Malaria is caused by parasites transmitted by mosquito bite. The most dangerous form of malaria is malaria tropica. Left untreated, it is fatal in most cases....

Im Focus: A “cosmic snake” reveals the structure of remote galaxies

The formation of stars in distant galaxies is still largely unexplored. For the first time, astron-omers at the University of Geneva have now been able to closely observe a star system six billion light-years away. In doing so, they are confirming earlier simulations made by the University of Zurich. One special effect is made possible by the multiple reflections of images that run through the cosmos like a snake.

Today, astronomers have a pretty accurate idea of how stars were formed in the recent cosmic past. But do these laws also apply to older galaxies? For around a...

Im Focus: Visual intelligence is not the same as IQ

Just because someone is smart and well-motivated doesn't mean he or she can learn the visual skills needed to excel at tasks like matching fingerprints, interpreting medical X-rays, keeping track of aircraft on radar displays or forensic face matching.

That is the implication of a new study which shows for the first time that there is a broad range of differences in people's visual ability and that these...

Im Focus: Novel Nano-CT device creates high-resolution 3D-X-rays of tiny velvet worm legs

Computer Tomography (CT) is a standard procedure in hospitals, but so far, the technology has not been suitable for imaging extremely small objects. In PNAS, a team from the Technical University of Munich (TUM) describes a Nano-CT device that creates three-dimensional x-ray images at resolutions up to 100 nanometers. The first test application: Together with colleagues from the University of Kassel and Helmholtz-Zentrum Geesthacht the researchers analyzed the locomotory system of a velvet worm.

During a CT analysis, the object under investigation is x-rayed and a detector measures the respective amount of radiation absorbed from various angles....

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Ecology Across Borders: International conference brings together 1,500 ecologists

15.11.2017 | Event News

Road into laboratory: Users discuss biaxial fatigue-testing for car and truck wheel

15.11.2017 | Event News

#Berlin5GWeek: The right network for Industry 4.0

30.10.2017 | Event News

 
Latest News

Underwater acoustic localization of marine mammals and vehicles

23.11.2017 | Information Technology

Enhancing the quantum sensing capabilities of diamond

23.11.2017 | Physics and Astronomy

Meadows beat out shrubs when it comes to storing carbon

23.11.2017 | Life Sciences

VideoLinks
B2B-VideoLinks
More VideoLinks >>>