Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

New method to facilitate extraction

18.12.2008
With today's large flows of data-based texts it is important to produce systems that facilitate searches for the particular information that is required.

Information on, for example, events in a company from news texts; who is leaving which post, why, to which company and position the person is moving etc. In his thesis Fredrik Olsson deals with a new method of facilitating the marking up of occurrences of names in data-based textual documents.

Information extraction entails analysing texts with the aim of identifying and picking out information about predefined types of entities, events in which the entities are engaged and relationships between entities and events. In other words it is about gaining access to structured information from an apparently unstructured source of information.

One of the reasons that information extraction is not available for everyone is that it requires a lot of work and time to adapt a system to function for new data in a new text domain. A system that could handle the scenario used as an example above would probably not function at all if the data were changed to identifying interactions between proteins described in biomedical text.

An established way of approaching the problem of domain adaptation of systems for information extraction is to realise its components using machine learning, i.e. computer programs that can learn. In many respects machine learning is based on there being examples from which to learn. A component in an extraction system needs to see examples of the phenomenon it is going to learn to identify, e.g. entities and the relationships between them. The basis of this type of machine learning is thus access to large quantities of examples. However, there are major challenges in producing good examples: it is laborious, takes time and requires a person who knows the domain well to mark up examples in texts.

Recognising names of, for example individuals, companies and locations is fundamental for information extraction. By recognising names we can also start to look for, for example, relationships, expressed in the text, between the bearers of the names.

In his thesis Fredrik Olsson describes the work of developing and evaluating a method, called BootMark, of marking up the occurrence of names in textual documents. BootMark contributes to reducing the quantity of documents that a human annotator needs to mark up in order to train a name recognizer with a performance that is equally good or better than a name recognizer who is trained in a random selection of documents from the same corpus.

Title of the thesis: Bootstrapping Named Entity Annotation by Means of Active Machine Learning. A method for creating corpora.
The thesis will be public defended on Friday 19 December at 1.15 pm
Location: Lilla hörsalen, Humanisten, Renströmsgatan 6
For further information contact Fredrik Olsson, mobile: +46 (0)704 -15 54 10,
e-mail: fredriko@sics.se
Contact person: Barbro Ryder Liljegren Faculty of Arts, University of Gothenburg Tel. +46 (0)31-786 48 65, e-mail: barbro.ryder@hum.gu.se

Eva Lundgren | idw
Further information:
http://www.vr.se

More articles from Communications Media:

nachricht New Technologies for A/V Analysis and Search
13.04.2017 | Fraunhofer-Institut für Digitale Medientechnologie IDMT

nachricht On patrol in social networks
25.01.2017 | Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO

All articles from Communications Media >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: A quantum walk of photons

Physicists from the University of Würzburg are capable of generating identical looking single light particles at the push of a button. Two new studies now demonstrate the potential this method holds.

The quantum computer has fuelled the imagination of scientists for decades: It is based on fundamentally different phenomena than a conventional computer....

Im Focus: Turmoil in sluggish electrons’ existence

An international team of physicists has monitored the scattering behaviour of electrons in a non-conducting material in real-time. Their insights could be beneficial for radiotherapy.

We can refer to electrons in non-conducting materials as ‘sluggish’. Typically, they remain fixed in a location, deep inside an atomic composite. It is hence...

Im Focus: Wafer-thin Magnetic Materials Developed for Future Quantum Technologies

Two-dimensional magnetic structures are regarded as a promising material for new types of data storage, since the magnetic properties of individual molecular building blocks can be investigated and modified. For the first time, researchers have now produced a wafer-thin ferrimagnet, in which molecules with different magnetic centers arrange themselves on a gold surface to form a checkerboard pattern. Scientists at the Swiss Nanoscience Institute at the University of Basel and the Paul Scherrer Institute published their findings in the journal Nature Communications.

Ferrimagnets are composed of two centers which are magnetized at different strengths and point in opposing directions. Two-dimensional, quasi-flat ferrimagnets...

Im Focus: World's thinnest hologram paves path to new 3-D world

Nano-hologram paves way for integration of 3-D holography into everyday electronics

An Australian-Chinese research team has created the world's thinnest hologram, paving the way towards the integration of 3D holography into everyday...

Im Focus: Using graphene to create quantum bits

In the race to produce a quantum computer, a number of projects are seeking a way to create quantum bits -- or qubits -- that are stable, meaning they are not much affected by changes in their environment. This normally needs highly nonlinear non-dissipative elements capable of functioning at very low temperatures.

In pursuit of this goal, researchers at EPFL's Laboratory of Photonics and Quantum Measurements LPQM (STI/SB), have investigated a nonlinear graphene-based...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Marine Conservation: IASS Contributes to UN Ocean Conference in New York on 5-9 June

24.05.2017 | Event News

AWK Aachen Machine Tool Colloquium 2017: Internet of Production for Agile Enterprises

23.05.2017 | Event News

Dortmund MST Conference presents Individualized Healthcare Solutions with micro and nanotechnology

22.05.2017 | Event News

 
Latest News

Physicists discover mechanism behind granular capillary effect

24.05.2017 | Physics and Astronomy

Measured for the first time: Direction of light waves changed by quantum effect

24.05.2017 | Physics and Astronomy

Marine Conservation: IASS Contributes to UN Ocean Conference in New York on 5-9 June

24.05.2017 | Event News

VideoLinks
B2B-VideoLinks
More VideoLinks >>>