New Software Transforms Image Retrieval with ALIP System

New software that responds to written questions by retrieving digital images has potentially broad application, ranging from helping radiologists compare mammograms to streamlining museum curators’ archiving of artwork, say the Penn State researchers who developed the technology.

Dr. James Z. Wang, assistant professor in Penn State’s School of Information Sciences and Technology and principal investigator, says the Automatic Linguistic Indexing of Pictures (ALIP) system first builds a pictorial dictionary, and then uses it for associating images with keywords. The new technology functions like a human expert who annotates or classifies terms.

“While the prototype is in its infancy, it has demonstrated great potential for use in biomedicine by reading x-rays and CT scans as well as in digital libraries, business, Web searches and the military,” said Wang, who holds the PNC Technologies Career Development Professorship at IST and also is a member of the Department of Computer Science and Engineering.

ALIP processes images the way people seem to. When we see a new kind of vehicle with two wheels, a seat and a handlebar, for instance, we recognize it as “a bicycle” from information about related images stored in our brains. ALIP has a similar bank of statistical models “learned” from analyzing image features.

The system is detailed in a paper, “Learning-based Linguistic Indexing of Pictures with 2-D MHMMs,” to be given today (Dec. 4) at the Association of Computing Machinery’s (ACM) Multimedia Conference in Juan Les Pins, France. Co-author is Dr. Jia Li, Penn State assistant professor of statistics.

Unlike other content-based retrieval systems that compare features of visually similar images, ALIP uses verbal cues that range from simple concepts such as “flowers” and “mushrooms” to higher-level ones such as “rural” and “European.” ALIP also can classify images into a larger number of categories than other systems, thereby broadening the uses of image databases.

Other advantages include ALIP’s abilities to be trained with a relatively large number of concepts simultaneously and with images that are not necessarily visually similar.

In one experiment, Wang and Li “trained” ALIP with 24,000 photographs found on 600 CD-ROMs, with each CD-ROM collection assigned keywords to describe its content. After “learning” these images, the computer then automatically created a dictionary of concepts such as “building,” “landscape,” and “European.” Statistical modeling enabled ALIP to automatically index new or unlearned images with the linguistic terms of the dictionary.

Wang tested that dictionary with 5,000 randomly selected images to see if the computer could provide meaningful keyword annotations for the new images. His conclusion: The more specific the query for an image, the higher the system’s degree of accuracy in retrieving an appropriate image.

Wang and Li are using ALIP as part of a three-year National Science Foundation research project to develop digital imagery technologies for the preservation and cataloguing of Asian art and cultural heritages. This research aims to bypass or reduce the efforts in the labor-intensive creation and entry of manual descriptions or artwork.

Eventually, the system is expected to identify the discriminating features of Chinese landscape paintings and the distinguishing characteristics of paintings from different historical periods, Wang notes.

The researchers’ progress in the first year of that project is discussed in the paper, “Interdisciplinary Research to Advance Digital Imagery Indexing and Retrieval Technologies for Asian Art and Cultural Heritages.” The research will be presented on Dec. 6 at in a special session of ACM’s Multimedia Conference in France.

Further research will be aimed at improving ALIP’s accuracy and speed.

ALIP’s reading of a beach scene with sailboats yielded the keyword annotations of “ocean,” “paradise,” “San Diego,” “Thailand,” “beach” and “fish.” Even though the computer was intelligent enough to recognize the high-level concept of “paradise,” additional research will focus on making the technology more accurate, so that San Diego and Thailand will not appear in the annotation of the same picture, Wang says.

“This system has the potential to change how we handle images in our daily life by giving us better and more access,” Wang says. Wang and Li’s latest research builds on their earlier efforts at Stanford University. Sun Microsystems provided most of the equipment used in the project.