From data to dating
What project can possibly bring together techniques as diverse as data mining and decision support with the fields of online dating, traffic accident analysis and many more. Only one! SolEuNet.
Modern solution-oriented work processes often bring together loose networks of highly-skilled and entrepreneurial individuals - creating in effect virtual organisations. Such virtual workgroups form up to solve a problem, then break apart and reform around the next challenge. A highly efficient and innovative way of working - but how do you build up a knowledge base and retain it after the group has split up? Researchers in the IST project SolEuNet believe they have an answer.
SolEuNet aimed to develop a Web-based infrastructure for virtual enterprise services in data mining and decision support. The goal was to bring a specialist team of data mining experts together to cooperate in solving each data mining problem provided by a customer. Each expert would apply his/her individual methods to solving the problem, but communicate with the others to share the growing understanding.
Says Marko Grobelnik of the Jozef Stefan Institute in Ljubljana, Slovenia, "Our main achievement was to establish the infrastructure for the connected enterprise, so that the research knowledge of academic institutions could be shared and turned into product solutions. This knowledge was applied in a whole range of case studies."
Improving the match process for online dating
Probably the most interesting case was the application of these techniques to the needs of an online dating agency. The online agency nomorefrogs.com needed to improve its system of matching individuals who were looking to form a relationship.
Many factors, not least personality traits, influence relationships. The SolEuNet data mining process was used to detect novel patterns of online dating behaviour hitherto unsuspected. By analysing the agencys data and comparing the matches made and the frequency with which pairs of individuals exchange messages, the researchers were able to determine behaviour patterns that could be used to improve the decisions made by the matching engine.
"The agency uses personality profiles to match likely individuals," says Grobelnik. "We applied our data mining techniques to improve the success rate of the matching process." He admits however that it has been difficult to evaluate the extent of any improvement, since in the nature of the business, once two individuals begin to correspond successfully they tend to interact directly rather than via the agency.
Discovering better solutions
Completed in January 2003, SolEuNet was a three-year long research project with 12 partners that involved about 60 people. The techniques developed within the project were implemented as Web systems and integrated into the project website.
Participants developed a system called RAMSYS that allowed researchers to share ideas and results, in such a way as to foster the discovery of better solutions. The RAMSYS infrastructure provided a template for each business case. The space allocated to a project offered separate sections for coordination, collaboration and communication, with access controls for different groups of readers, contributors and process managers.
Project coordinator Dunja Mladenic, also of the Jozef Stefan Institute, emphasises the variety of cases to which these techniques were applied. "A large number of case studies was involved. They varied from online dating through traffic accident analysis to selecting the right bank to join in a housing scheme. You have to visit the site to see them all."
New views on road traffic accident statistics
In another case for example, Hampshire County Council in the UK wanted to obtain a better insight into how vehicle accidents have changed over the past 20 years as a result of improvements in highway and vehicle design. A major challenge was the sheer size of the dataset involved (1.5Gb).
Seven project partners applied a range of data mining techniques to this case study. To highlight a few of the results:
- An innovative visualisation method pioneered by the University of Leuven indicated the development of accidents by location over time, and helped to point out data quality issues over grid references.
- Prague University used association rules to find links between road numbers and particular classes of accidents. The end-user was particularly pleased with the format and contents of these findings, which are now being evaluated by local experts.
- The Jozef Stefan Institute used text mining technology and subgroup discovery to determine the most common kinds of accidents.
- Bristol University used a dynamic subgroup discovery method, which highlighted certain data quality issues.
Bringing together a workgroup of experts from different fields generated a considerable number of new data associations. These ranged from innovative visualisation of location data over time, to methods to identify data quality problems, to establishing patterns and rules beyond the normal clustering techniques used in this field.
John Bullas of Hampshire County Council states that, "Feedback by local experts is currently being obtained to assess the full value of the findings in the real world, but the analysis of the database performed so far by the SolEuNet consortium holds considerable promise for the application of these technologies to other databases that are currently analysed with long-established and limited repertoires of processing tools."
Dunja Mladenic/Marko Grobelnik
Jozef Stefan Institute
Department of Intelligent Systems
Source: Based on information from SolEuNet
Tara Morris | IST results