Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Bug repellent for supercomputers proves effective

15.11.2012
Lawrence Livermore National Laboratory (LLNL) researchers have used the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool to debug a program running more than one million MPI processes on the IBM Blue Gene/Q (BGQ)-based Sequoia supercomputer.

The debugging tool is a significant milestone in LLNL's multi-year collaboration with the University of Wisconsin (UW), Madison and the University of New Mexico (UNM) to ensure supercomputers run more efficiently.

Playing a significant role in scaling up the Sequoia supercomputer, STAT, a 2011 R&D 100 Award winner, has helped both early access users and system integrators quickly isolate a wide range of errors, including particularly perplexing issues that only manifested at extremely large scales up to 1,179,648 compute cores. During the Sequoia scale-up, bugs in applications as well as defects in system software and hardware have manifested themselves as failures in applications. It is important to quickly diagnose errors so they can be reported to experts who can analyze them in detail and ultimately solve the problem.

"STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," said LLNL computer scientist Greg Lee.

"While testing a subsystem of Blue/Gene Q, my test program consistently failed only when scaled to 1,179,648 MPI processes. Although the test program was simple, the sheer scale at which this program ran made debugging efforts highly challenging. But when I applied STAT, it quickly revealed that one particular rank process was consistently stuck in a system call," said Dong Ahn, a computer scientist in Livermore Computing.

Based on this finding, a system expert took a close look at the compute core on which this rank process was running and discovered a hardware defect. "Replacing the component suddenly got the entire Sequoia system back to life," Ahn said. "Putting this exercise into perspective, this error was due to a defect in a tiny hardware unit, the decrementor, of a single hardware thread out of a total of 4.7 million hardware threads. I felt it was like finding a needle in a haystack over a coffee break."

Sequoia delivers 20 petaflops of peak power and was ranked No. 1 in June of this year's TOP500 list. It is currently ranked No. 2, behind Oak Ridge National Laboratory's Titan.

LLNL plans to use Sequoia's impressive computational capability to advance understanding of fundamental physics and engineering questions that arise in the National Nuclear Security Administration's (NNSA) program to ensure the safety, security and effectiveness of the United States' nuclear deterrent without testing. Sequoia also will support NNSA/DOE programs at LLNL that focus on nonproliferation, counterterrorism, energy, security, health and climate change.

As LLNL takes delivery of the Sequoia system and works to move it into production, computer scientists will migrate applications that have been running on earlier systems to this newer architecture. This is a period of intense activity for LLNL's application teams as they gain experience with the new hardware and software environment.

"Having a highly effective debugging tool that scales to the full system is vital to the installation and acceptance process for Sequoia. It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," said Kim Cupps, leader of the Livermore Computing Division at LLNL.

STAT is particularly important for LLNL because supercomputer simulations are essential in virtually every mission area of the Laboratory. The tool also has been used at other sites and proved to be effective on a wide range of supercomputer platforms, including Linux clusters and Cray systems.

The team is actively pursuing further optimization of STAT technologies and is exploring commercialization strategies. More information about STAT, including a link to the source code, is available on the Web.

More Information
STAT
ASC Sequoia
Early science runs prepare Lawrence Livermore National Lab's Sequoia for national security missions

LLNL news release, Nov. 9, 2012

"Venturing into the heart of high-performance computing simulations"
Science & Technology Review, September 2012
Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation's most important national security challenges through innovative science, engineering and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration.

Anne Stark | EurekAlert!
Further information:
http://www.llnl.gov

More articles from Information Technology:

nachricht New Foldable Drone Flies through Narrow Holes in Rescue Missions
12.12.2018 | Universität Zürich

nachricht NIST's antenna evaluation method could help boost 5G network capacity and cut costs
11.12.2018 | National Institute of Standards and Technology (NIST)

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Lethal combination: Drug cocktail turns off the juice to cancer cells

A widely used diabetes medication combined with an antihypertensive drug specifically inhibits tumor growth – this was discovered by researchers from the University of Basel’s Biozentrum two years ago. In a follow-up study, recently published in “Cell Reports”, the scientists report that this drug cocktail induces cancer cell death by switching off their energy supply.

The widely used anti-diabetes drug metformin not only reduces blood sugar but also has an anti-cancer effect. However, the metformin dose commonly used in the...

Im Focus: New Foldable Drone Flies through Narrow Holes in Rescue Missions

A research team from the University of Zurich has developed a new drone that can retract its propeller arms in flight and make itself small to fit through narrow gaps and holes. This is particularly useful when searching for victims of natural disasters.

Inspecting a damaged building after an earthquake or during a fire is exactly the kind of job that human rescuers would like drones to do for them. A flying...

Im Focus: Topological material switched off and on for the first time

Key advance for future topological transistors

Over the last decade, there has been much excitement about the discovery, recognised by the Nobel Prize in Physics only two years ago, that there are two types...

Im Focus: Researchers develop method to transfer entire 2D circuits to any smooth surface

What if a sensor sensing a thing could be part of the thing itself? Rice University engineers believe they have a two-dimensional solution to do just that.

Rice engineers led by materials scientists Pulickel Ajayan and Jun Lou have developed a method to make atom-flat sensors that seamlessly integrate with devices...

Im Focus: Three components on one chip

Scientists at the University of Stuttgart and the Karlsruhe Institute of Technology (KIT) succeed in important further development on the way to quantum Computers.

Quantum computers one day should be able to solve certain computing problems much faster than a classical computer. One of the most promising approaches is...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

ICTM Conference 2019: Digitization emerges as an engineering trend for turbomachinery construction

12.12.2018 | Event News

New Plastics Economy Investor Forum - Meeting Point for Innovations

10.12.2018 | Event News

EGU 2019 meeting: Media registration now open

06.12.2018 | Event News

 
Latest News

New discoveries predict ability to forecast dementia from single molecule

12.12.2018 | Health and Medicine

CCNY-Yale researchers make shape shifting cell breakthrough

12.12.2018 | Physics and Astronomy

Pain: Perception and motor impulses arise in the brain independently of one another

12.12.2018 | Health and Medicine

VideoLinks
Science & Research
Overview of more VideoLinks >>>