Forum for Science, Industry and Business
Sponsored by:     Siemens  n-tv 
Search our Site:

Topic (optional):

 

Home Reports Communications Media Content

New method to facilitate extraction

next article
18.12.2008

With today's large flows of data-based texts it is important to produce systems that facilitate searches for the particular information that is required.

 

Information on, for example, events in a company from news texts; who is leaving which post, why, to which company and position the person is moving etc. In his thesis Fredrik Olsson deals with a new method of facilitating the marking up of occurrences of names in data-based textual documents.


Information extraction entails analysing texts with the aim of identifying and picking out information about predefined types of entities, events in which the entities are engaged and relationships between entities and events. In other words it is about gaining access to structured information from an apparently unstructured source of information.

One of the reasons that information extraction is not available for everyone is that it requires a lot of work and time to adapt a system to function for new data in a new text domain. A system that could handle the scenario used as an example above would probably not function at all if the data were changed to identifying interactions between proteins described in biomedical text.

An established way of approaching the problem of domain adaptation of systems for information extraction is to realise its components using machine learning, i.e. computer programs that can learn. In many respects machine learning is based on there being examples from which to learn. A component in an extraction system needs to see examples of the phenomenon it is going to learn to identify, e.g. entities and the relationships between them. The basis of this type of machine learning is thus access to large quantities of examples. However, there are major challenges in producing good examples: it is laborious, takes time and requires a person who knows the domain well to mark up examples in texts.

Recognising names of, for example individuals, companies and locations is fundamental for information extraction. By recognising names we can also start to look for, for example, relationships, expressed in the text, between the bearers of the names.

In his thesis Fredrik Olsson describes the work of developing and evaluating a method, called BootMark, of marking up the occurrence of names in textual documents. BootMark contributes to reducing the quantity of documents that a human annotator needs to mark up in order to train a name recognizer with a performance that is equally good or better than a name recognizer who is trained in a random selection of documents from the same corpus.

Title of the thesis: Bootstrapping Named Entity Annotation by Means of Active Machine Learning. A method for creating corpora.
The thesis will be public defended on Friday 19 December at 1.15 pm
Location: Lilla hörsalen, Humanisten, Renströmsgatan 6

For further information contact Fredrik Olsson, mobile: +46 (0)704 -15 54 10,
e-mail: fredriko@sics.se

Contact person: Barbro Ryder Liljegren Faculty of Arts, University of Gothenburg Tel. +46 (0)31-786 48 65, e-mail: barbro.ryder@hum.gu.se

Eva Lundgren | Source: Informationsdienst Wissenschaft
Further information: www.vr.se

next article

More articles from Communications Media:

nachricht eStadium Application Brings Multimedia Sports Features to Smartphones
10.11.2009 | Georgia Institute of Technology

nachricht Television Has Less Effect on Education about Climate Change than Other Forms of Media
20.10.2009 | George Mason University

All articles from Communications Media >>>

B2B Search

Product / Service
Company / Organisation

Latest News

UCSB physicists move 1 step closer to quantum computing

23.11.2009 | Physics and Astronomy

Fat around the middle increases the risk of dementia

23.11.2009 | Studies and Analyses

New discovery about the formation of new brain cells

23.11.2009 | Health and Medicine

VideoLinks

Event News

Multidisciplinary meeting on Urological Cancers aims to benefit cancer patients

20.11.2009 | Event News

'Golden Age' for clinical psychology in Northern Ireland

20.11.2009 | Event News

New Perspectives in Marine Anti-Fouling Research

11.11.2009 | Event News