Design lets chip manage local memory stores efficiently using an Internet-style communication network.
The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size.
This week, at the International Symposium on Computer Architecture, Peh’s group unveiled a 36-core chip that features just such a “network-on-chip.” In addition to implementing many of the group’s earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores’ locally stored copies of globally accessible data remain up to date.
In today’s chips, all the cores — typically somewhere between two and six — are connected by a single wire, called a bus. When two cores need to communicate, they’re granted exclusive access to the bus.
But that approach won’t work as the core count mounts: Cores will spend all their time waiting for the bus to free up, rather than performing computations.
In a network-on-chip, each core is connected only to those immediately adjacent to it. “You can reach your neighbors really quickly,” says Bhavya Daya, an MIT graduate student in electrical engineering and computer science, and first author on the new paper. “You can also have multiple paths to your destination. So if you’re going way across, rather than having one congested path, you could have multiple ones.”
One advantage of a bus, however, is that it makes it easier to maintain cache coherence. Every core on a chip has its own cache, a local, high-speed memory bank in which it stores frequently used data. As it performs computations, it updates the data in its cache, and every so often, it undertakes the relatively time-consuming chore of shipping the data back to main memory.
But what happens if another core needs the data before it’s been shipped? Most chips address this question with a protocol called “snoopy,” because it involves snooping on other cores’ communications. When a core needs a particular chunk of data, it broadcasts a request to all the other cores, and whichever one has the data ships it back.
If all the cores share a bus, then when one of them receives a data request, it knows that it’s the most recent request that’s been issued. Similarly, when the requesting core gets data back, it knows that it’s the most recent version of the data.
But in a network-on-chip, data is flying everywhere, and packets will frequently arrive at different cores in different sequences. The implicit ordering that the snoopy protocol relies on breaks down.
Daya, Peh, and their colleagues solve this problem by equipping their chips with a second network, which shadows the first. The circuits connected to this network are very simple: All they can do is declare that their associated cores have sent requests for data over the main network. But precisely because those declarations are so simple, nodes in the shadow network can combine them and pass them on without incurring delays.
Groups of declarations reach the routers associated with the cores at discrete intervals — intervals corresponding to the time it takes to pass from one end of the shadow network to another. Each router can thus tabulate exactly how many requests were issued during which interval, and by which other cores. The requests themselves may still take a while to arrive, but their recipients know that they’ve been issued.
During each interval, the chip’s 36 cores are given different, hierarchical priorities. Say, for instance, that during one interval, both core 1 and core 10 issue requests, but core 1 has a higher priority. Core 32’s router may receive core 10’s request well before it receives core 1’s. But it will hold it until it’s passed along 1’s.
This hierarchical ordering simulates the chronological ordering of requests sent over a bus, so the snoopy protocol still works. The hierarchy is shuffled during every interval, however, to ensure that in the long run, all the cores receive equal weight.
Cache coherence in multicore chips “is a big problem, and it’s one that gets larger all the time,” says Todd Austin, a professor of electrical engineering and computer science at the University of Michigan. “Their contribution is an interesting one: They’re saying, ‘Let’s get rid of a lot of the complexity that’s in existing networks. That will create more avenues for communication, and our clever communication protocol will sort out all the details.’ It’s a much simpler approach and a faster approach. It’s a really clever idea.”
“One of the challenges in academia is convincing industry that our ideas are practical and useful,” Austin adds. “They’ve really taken the best approach to demonstrating that, in that they’ve built a working chip. I’d be surprised if these technologies didn’t find their way into commercial products.”
After testing the prototype chips to ensure that they’re operational, Daya intends to load them with a version of the Linux operating system, modified to run on 36 cores, and evaluate the performance of real applications, to determine the accuracy of the group’s theoretical projections. At that point, she plans to release the blueprints for the chip, written in the hardware description language Verilog, as open-source code.
Sarah McDonnell | Eurek Alert!
The Flexible Grid Involves its Users
27.09.2016 | Fraunhofer-Institut für Angewandte Informationstechnik FIT
Optical fiber transmits one terabit per second – Novel modulation approach
16.09.2016 | Technische Universität München
Heavy construction machinery is the focus of Oak Ridge National Laboratory’s latest advance in additive manufacturing research. With industry partners and university students, ORNL researchers are designing and producing the world’s first 3D printed excavator, a prototype that will leverage large-scale AM technologies and explore the feasibility of printing with metal alloys.
Increasing the size and speed of metal-based 3D printing techniques, using low-cost alloys like steel and aluminum, could create new industrial applications...
Friction stir welding is a still-young and thus often unfamiliar pressure welding process for joining flat components and semi-finished components made of light metals.
Scientists at the University of Stuttgart have now developed two new process variants that will considerably expand the areas of application for friction stir welding.
Technologie-Lizenz-Büro (TLB) GmbH supports the University of Stuttgart in patenting and marketing its innovations.
Friction stir welding is a still-young and thus often unfamiliar pressure welding process for joining flat components and semi-finished components made of...
Optical quantum computers can revolutionize computer technology. A team of researchers led by scientists from Münster University and KIT now succeeded in putting a quantum optical experimental set-up onto a chip. In doing so, they have met one of the requirements for making it possible to use photonic circuits for optical quantum computers.
Optical quantum computers are what people are pinning their hopes on for tomorrow’s computer technology – whether for tap-proof data encryption, ultrafast...
The Fraunhofer Institute for Organic Electronics, Electron Beam and Plasma Technology FEP has been developing various applications for OLED microdisplays based on organic semiconductors. By integrating the capabilities of an image sensor directly into the microdisplay, eye movements can be recorded by the smart glasses and utilized for guidance and control functions, as one example. The new design will be debuted at Augmented World Expo Europe (AWE) in Berlin at Booth B25, October 18th – 19th.
“Augmented-reality” and “wearables” have become terms we encounter almost daily. Both can make daily life a little simpler and provide valuable assistance for...
With the help of artificial intelligence, chemists from the University of Basel in Switzerland have computed the characteristics of about two million crystals made up of four chemical elements. The researchers were able to identify 90 previously unknown thermodynamically stable crystals that can be regarded as new materials. They report on their findings in the scientific journal Physical Review Letters.
Elpasolite is a glassy, transparent, shiny and soft mineral with a cubic crystal structure. First discovered in El Paso County (Colorado, USA), it can also be...
30.09.2016 | Event News
29.09.2016 | Event News
28.09.2016 | Event News
30.09.2016 | Materials Sciences
30.09.2016 | Earth Sciences
30.09.2016 | Life Sciences