Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Research identifies key weakness in modern computer vision systems

31.07.2018

Computer vision algorithms have come a long way in the past decade. They've been shown to be as good or better than people at tasks like categorizing dog or cat breeds, and they have the remarkable ability to identify specific faces out of a sea of millions.

But research by Brown University scientists shows that computers fail miserably at a class of tasks that even young children have no problem with: determining whether two objects in an image are the same or different. In a paper presented last week at the annual meeting of the Cognitive Science Society, the Brown team sheds light on why computers are so bad at these types of tasks and suggests avenues toward smarter computer vision systems.


Computers are great at categorizing images by the objects found with them, but they're surprisingly bad at figuring out when two objects in a single image are the same or different from each other. New research helps to show why that task is so difficult for modern computer vision algorithms.

Credit: Serre lab / Brown University

"There's a lot of excitement about what computer vision has been able to achieve, and I share a lot of that," said Thomas Serre, associate professor of cognitive, linguistic and psychological sciences at Brown and the paper's senior author. "But we think that by working to understand the limitations of current computer vision systems as we've done here, we can really move toward new, much more advanced systems rather than simply tweaking the systems we already have."

For the study, Serre and his colleagues used state-of-the-art computer vision algorithms to analyze simple black-and-white images containing two or more randomly generated shapes. In some cases the objects were identical; sometimes they were the same but with one object rotated in relation to the other; sometimes the objects were completely different. The computer was asked to identify the same-or-different relationship.

The study showed that, even after hundreds of thousands of training examples, the algorithms were no better than chance at recognizing the appropriate relationship. The question, then, was why these systems are so bad at this task.

Serre and his colleagues had a suspicion that it has something to do with the inability of these computer vision algorithms to individuate objects. When computers look at an image, they can't actually tell where one object in the image stops and the background, or another object, begins. They just see a collection of pixels that have similar patterns to collections of pixels they've learned to associate with certain labels. That works fine for identification or categorization problems, but falls apart when trying to compare two objects.

To show that this was indeed why the algorithms were breaking down, Serre and his team performed experiments that relieved the computer from having to individuate objects on its own. Instead of showing the computer two objects in the same image, the researchers showed the computer the objects one at a time in separate images. The experiments showed that the algorithms had no problem learning same-or-different relationship as long as they didn't have to view the two objects in the same image.

The source of the problem in individuating objects, Serre says, is the architecture of the machine learning systems that power the algorithms. The algorithms use convolutional neural networks -- layers of connected processing units that loosely mimic networks of neurons in the brain. A key difference from the brain is that the artificial networks are exclusively "feed-forward" -- meaning information has a one-way flow through the layers of the network. That's not how the visual system in humans works, according to Serre.

"If you look at the anatomy of our own visual system, you find that there are a lot of recurring connections, where the information goes from a higher visual area to a lower visual area and back through," Serre said.

While it's not clear exactly what those feedbacks do, Serre says, it's likely that they have something to do with our ability to pay attention to certain parts of our visual field and make mental representations of objects in our minds.

"Presumably people attend to one object, building a feature representation that is bound to that object in their working memory," Serre said. "Then they shift their attention to another object. When both objects are represented in working memory, your visual system is able to make comparisons like same-or-different."

Serre and his colleagues hypothesize that the reason computers can't do anything like that is because feed-forward neural networks don't allow for the kind of recurrent processing required for this individuation and mental representation of objects. It could be, Serre says, that making computer vision smarter will require neural networks that more closely approximate the recurrent nature of human visual processing.

###

Serre's co-authors on the paper were Junkyung Kim and Matthew Ricci. The research was supported by the National Science Foundation (IIS-1252951, 1644760) and DARPA (YFA N66001-14-1-4037).

Media Contact

Kevin Stacey
kevin_stacey@brown.edu
401-863-3766

 @brownuniversity

http://news.brown.edu/ 

Kevin Stacey | EurekAlert!
Further information:
https://news.brown.edu/articles/2018/07/same-different

More articles from Information Technology:

nachricht Robots as Tools and Partners in Rehabilitation
17.08.2018 | Albert-Ludwigs-Universität Freiburg im Breisgau

nachricht Low bandwidth? Use more colors at once
17.08.2018 | Purdue University

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Color effects from transparent 3D-printed nanostructures

New design tool automatically creates nanostructure 3D-print templates for user-given colors
Scientists present work at prestigious SIGGRAPH conference

Most of the objects we see are colored by pigments, but using pigments has disadvantages: such colors can fade, industrial pigments are often toxic, and...

Im Focus: Unraveling the nature of 'whistlers' from space in the lab

A new study sheds light on how ultralow frequency radio waves and plasmas interact

Scientists at the University of California, Los Angeles present new research on a curious cosmic phenomenon known as "whistlers" -- very low frequency packets...

Im Focus: New interactive machine learning tool makes car designs more aerodynamic

Scientists develop first tool to use machine learning methods to compute flow around interactively designable 3D objects. Tool will be presented at this year’s prestigious SIGGRAPH conference.

When engineers or designers want to test the aerodynamic properties of the newly designed shape of a car, airplane, or other object, they would normally model...

Im Focus: Robots as 'pump attendants': TU Graz develops robot-controlled rapid charging system for e-vehicles

Researchers from TU Graz and their industry partners have unveiled a world first: the prototype of a robot-controlled, high-speed combined charging system (CCS) for electric vehicles that enables series charging of cars in various parking positions.

Global demand for electric vehicles is forecast to rise sharply: by 2025, the number of new vehicle registrations is expected to reach 25 million per year....

Im Focus: The “TRiC” to folding actin

Proteins must be folded correctly to fulfill their molecular functions in cells. Molecular assistants called chaperones help proteins exploit their inbuilt folding potential and reach the correct three-dimensional structure. Researchers at the Max Planck Institute of Biochemistry (MPIB) have demonstrated that actin, the most abundant protein in higher developed cells, does not have the inbuilt potential to fold and instead requires special assistance to fold into its active state. The chaperone TRiC uses a previously undescribed mechanism to perform actin folding. The study was recently published in the journal Cell.

Actin is the most abundant protein in highly developed cells and has diverse functions in processes like cell stabilization, cell division and muscle...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

LaserForum 2018 deals with 3D production of components

17.08.2018 | Event News

Within reach of the Universe

08.08.2018 | Event News

A journey through the history of microscopy – new exhibition opens at the MDC

27.07.2018 | Event News

 
Latest News

Smallest transistor worldwide switches current with a single atom in solid electrolyte

17.08.2018 | Physics and Astronomy

Robots as Tools and Partners in Rehabilitation

17.08.2018 | Information Technology

Climate Impact Research in Hannover: Small Plants against Large Waves

17.08.2018 | Life Sciences

VideoLinks
Science & Research
Overview of more VideoLinks >>>