Two Dutch researchers analyse striking behaviour of websurfers
What behaviour do website visitors exhibit? Do they buy a specific product mainly on Mondays? Do they always return at a certain time of day?
Being able to recognise and make use of such patterns is lucrative business for companies. Edgar de Graaf discovered that interesting patterns often contain a time aspect. Jeroen De Knijf developed methods to detect relevant patterns quicker.
In subject jargon it is called data mining: looking for interesting relationships within large quantities of data. Many data-mining programs produce a flood of potentially interesting patterns: as a user, how can you then find what you are looking for? Furthermore, the files are not always set up for such search actions, as is the case on the Internet or for instance in bioinformatics. It usually concerns semi-structured files: they often contain, for example, hyperlinks to other files, and contain (partial) information in a range of formats, such as text, images and sound.
Edgar de Graaf and Jeroen De Knijf both worked within the NWO-funded MISTA project (Mining in Semi-Structured Data) on methods to find patterns more quickly and effectively within large quantities of semi-structured data. De Graaf discovered that some patterns are interesting because they occur in quick succession. Other patterns are striking because, for example, they occur weekly. According to De Graaf, this time aspect merits further investigation.
The patterns can best be presented visually so that the user can find the information sought at a single glance. To realise this De Graaf described various ways of presenting different types of information.
De Knijf demonstrated that the number of patterns can be drastically reduced by allowing the user to indicate in advance the minimum requirements that a pattern must satisfy. This allows the data-mining program to find the interesting patterns much faster.
A second method De Knijf devised to reduce the number of results is the compression of the entire collection of documents (for example, Wikipedia pages) into a single document. By building accurate models that only make use of the compressed document, De Knijf was able to demonstrate that this summary does indeed contain the essential information from the entire collection.
The research was funded from the Open Competition 2003 of NWO Physical Sciences.
Kim van den Wijngaard | alfa
More articles from Information Technology:
Drones that drive
27.06.2017 | Massachusetts Institute of Technology, CSAIL
Ahead of the Curve
27.06.2017 | Institute of Science and Technology Austria
The most recent press releases about innovation >>>
Die letzten 5 Focus-News des innovations-reports im Überblick:
An international team of scientists has proposed a new multi-disciplinary approach in which an array of new technologies will allow us to map biodiversity and the risks that wildlife is facing at the scale of whole landscapes. The findings are published in Nature Ecology and Evolution. This international research is led by the Kunming Institute of Zoology from China, University of East Anglia, University of Leicester and the Leibniz Institute for Zoo and Wildlife Research.
Using a combination of satellite and ground data, the team proposes that it is now possible to map biodiversity with an accuracy that has not been previously...
Heatwaves in the Arctic, longer periods of vegetation in Europe, severe floods in West Africa – starting in 2021, scientists want to explore the emissions of the greenhouse gas methane with the German-French satellite MERLIN. This is made possible by a new robust laser system of the Fraunhofer Institute for Laser Technology ILT in Aachen, which achieves unprecedented measurement accuracy.
Methane is primarily the result of the decomposition of organic matter. The gas has a 25 times greater warming potential than carbon dioxide, but is not as...
Hydrogen is regarded as the energy source of the future: It is produced with solar power and can be used to generate heat and electricity in fuel cells. Empa researchers have now succeeded in decoding the movement of hydrogen ions in crystals – a key step towards more efficient energy conversion in the hydrogen industry of tomorrow.
As charge carriers, electrons and ions play the leading role in electrochemical energy storage devices and converters such as batteries and fuel cells. Proton...
Scientists from the Excellence Cluster Universe at the Ludwig-Maximilians-Universität Munich have establised "Cosmowebportal", a unique data centre for cosmological simulations located at the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences. The complete results of a series of large hydrodynamical cosmological simulations are available, with data volumes typically exceeding several hundred terabytes. Scientists worldwide can interactively explore these complex simulations via a web interface and directly access the results.
With current telescopes, scientists can observe our Universe’s galaxies and galaxy clusters and their distribution along an invisible cosmic web. From the...
Temperature measurements possible even on the smallest scale / Molecular ruby for use in material sciences, biology, and medicine
Chemists at Johannes Gutenberg University Mainz (JGU) in cooperation with researchers of the German Federal Institute for Materials Research and Testing (BAM)...