DILIGENT Data Challenge on EGEE Infrastructure

The DILIGENT team used the EGEE computing Grid to process 37 million images from the online Flickr database in just 16 weeks. This computation generated approximately 112 million text and image objects—nearly 5 TB of data—containing more than 150 million extracted features. This is equivalent to an average processing capacity of over 300,000 images per day.

This unique collection will be used by the SAPIR project to develop new large-scale content-based data retrieval and automatic data classification techniques that combine both text and image content, expanding the limits of conventional search engines, which can only search text associated to images and audio-visual content.

The computational load required to generate this massive data collection was outsourced to DILIGENT, and then delegated to the EGEE Pre-Production Service (PPS) Grid infrastructure via the gLite middleware. A total of 44,333 gLite jobs were successfully executed by the EGEE PPS infrastructure resource broker. Each job processed approximately 1000 images.

The data challenge lasted for 116 days, from 16 June to 9 October 2007, and was organized in three different phases. During the initial preparation phase experimental jobs were submitted to some EGEE PPS sites to test the feature extraction application and optimize the number of images to process per day.

The next two phases involved actual execution of the data challenge, exploiting ten EGEE PPS sites that contributed their computational resources: University of Athens, Scuola Normale Superiore, ISTI-CNR, LIP, ESA-ESRIN, CERN, CESGA, University of Macedonia, Ben Gurion University, and CYFRONET. Four of these sites are maintained by DILIGENT partners.

Media Contact

Sarah Purcell alfa

All latest news from the category: Information Technology

Here you can find a summary of innovations in the fields of information and data processing and up-to-date developments on IT equipment and hardware.

This area covers topics such as IT services, IT architectures, IT management and telecommunications.

Back to home

Comments (0)

Write a comment

Newest articles

A universal framework for spatial biology

SpatialData is a freely accessible tool to unify and integrate data from different omics technologies accounting for spatial information, which can provide holistic insights into health and disease. Biological processes…

How complex biological processes arise

A $20 million grant from the U.S. National Science Foundation (NSF) will support the establishment and operation of the National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) at…

Airborne single-photon lidar system achieves high-resolution 3D imaging

Compact, low-power system opens doors for photon-efficient drone and satellite-based environmental monitoring and mapping. Researchers have developed a compact and lightweight single-photon airborne lidar system that can acquire high-resolution 3D…

Partners & Sponsors