To do this, the new techniques allow multi-core chips to deal with two things more efficiently: allocating bandwidth and “prefetching” data.
Multi-core chips are supposed to make our computers run faster. Each core on a chip is its own central processing unit, or computer brain. However, there are things that can slow these cores. For example, each core needs to retrieve data from memory that is not stored on its chip. There is a limited pathway – or bandwidth – these cores can use to retrieve that off-chip data. As chips have incorporated more and more cores, the bandwidth has become increasingly congested – slowing down system performance.
One of the ways to expedite core performance is called prefetching. Each chip has its own small memory component, called a cache. In prefetching, the cache predicts what data a core will need in the future and retrieves that data from off-chip memory before the core needs it. Ideally, this improves the core’s performance. But, if the cache’s prediction is inaccurate, it unnecessarily clogs the bandwidth while retrieving the wrong data. This actually slows the chip’s overall performance.
“The first technique relies on criteria we developed to determine how much bandwidth should be allotted to each core on a chip,” says Dr. Yan Solihin, associate professor of electrical and computer engineering at NC State and co-author of a paper describing the research. Some cores require more off-chip data than others. The researchers use easily-collected data from the hardware counters on each chip to determine which cores need more bandwidth. “By better distributing the bandwidth to the appropriate cores, the criteria are able to maximize system performance,” Solihin says.
“The second technique relies on a set of criteria we developed for determining when prefetching will boost performance and should be utilized,” Solihin says, “as well as when prefetching would slow things down and should be avoided.” These criteria also use data from each chip’s hardware counters. The prefetching criteria would allow manufacturers to make multi-core chips that operate more efficiently, because each of the individual cores would automatically turn prefetching on or off as needed.
Utilizing both sets of criteria, the researchers were able to boost multi-core chip performance by 40 percent, compared to multi-core chips that do not prefetch data, and by 10 percent over multi-core chips that always prefetch data.
The paper, “Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors,” will be presented June 9 at the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) in San Jose, Calif. The paper was co-authored by Dr. Fang Liu, a former Ph.D. student at NC State. The research was supported, in part, by the National Science Foundation.
NC State’s Department of Electrical and Computer Engineering is part of the university’s College of Engineering.
Note to Editors: The study abstract follows.
“Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors”
Authors: Fang Liu and Yan Solihin, North Carolina State University
Presented: June 9, 2011, at the International Conference on Measurement and Modeling of Computer Systems, San Jose, Calif.
Abstract: Modern high performance microprocessors widely employ hardware prefetching to hide long memory access latency. While useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly limited by the availability of off-chip pin bandwidth in Chip Multi-Processors (CMPs). In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact CMP system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into account prefetching effects, and a derivation of the weighted speedup-optimum bandwidth partition sizes for different cores. Through model-driven case studies, we find several interesting observations that can be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance can be achieved by selective prefetching, guided by the composite prefetching metric, coupled with dynamic bandwidth partitioning.
Matt Shipman | EurekAlert!
Ultra-precise chip-scale sensor detects unprecedentedly small changes at the nanoscale
18.01.2017 | The Hebrew University of Jerusalem
Data analysis optimizes cyber-physical systems in telecommunications and building automation
18.01.2017 | Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI
An important step towards a completely new experimental access to quantum physics has been made at University of Konstanz. The team of scientists headed by...
Yersiniae cause severe intestinal infections. Studies using Yersinia pseudotuberculosis as a model organism aim to elucidate the infection mechanisms of these...
Researchers from the University of Hamburg in Germany, in collaboration with colleagues from the University of Aarhus in Denmark, have synthesized a new superconducting material by growing a few layers of an antiferromagnetic transition-metal chalcogenide on a bismuth-based topological insulator, both being non-superconducting materials.
While superconductivity and magnetism are generally believed to be mutually exclusive, surprisingly, in this new material, superconducting correlations...
Laser-driving of semimetals allows creating novel quasiparticle states within condensed matter systems and switching between different states on ultrafast time scales
Studying properties of fundamental particles in condensed matter systems is a promising approach to quantum field theory. Quasiparticles offer the opportunity...
Among the general public, solar thermal energy is currently associated with dark blue, rectangular collectors on building roofs. Technologies are needed for aesthetically high quality architecture which offer the architect more room for manoeuvre when it comes to low- and plus-energy buildings. With the “ArKol” project, researchers at Fraunhofer ISE together with partners are currently developing two façade collectors for solar thermal energy generation, which permit a high degree of design flexibility: a strip collector for opaque façade sections and a solar thermal blind for transparent sections. The current state of the two developments will be presented at the BAU 2017 trade fair.
As part of the “ArKol – development of architecturally highly integrated façade collectors with heat pipes” project, Fraunhofer ISE together with its partners...
19.01.2017 | Event News
10.01.2017 | Event News
09.01.2017 | Event News
20.01.2017 | Awards Funding
20.01.2017 | Materials Sciences
20.01.2017 | Life Sciences