To do this, the new techniques allow multi-core chips to deal with two things more efficiently: allocating bandwidth and “prefetching” data.
Multi-core chips are supposed to make our computers run faster. Each core on a chip is its own central processing unit, or computer brain. However, there are things that can slow these cores. For example, each core needs to retrieve data from memory that is not stored on its chip. There is a limited pathway – or bandwidth – these cores can use to retrieve that off-chip data. As chips have incorporated more and more cores, the bandwidth has become increasingly congested – slowing down system performance.
One of the ways to expedite core performance is called prefetching. Each chip has its own small memory component, called a cache. In prefetching, the cache predicts what data a core will need in the future and retrieves that data from off-chip memory before the core needs it. Ideally, this improves the core’s performance. But, if the cache’s prediction is inaccurate, it unnecessarily clogs the bandwidth while retrieving the wrong data. This actually slows the chip’s overall performance.
“The first technique relies on criteria we developed to determine how much bandwidth should be allotted to each core on a chip,” says Dr. Yan Solihin, associate professor of electrical and computer engineering at NC State and co-author of a paper describing the research. Some cores require more off-chip data than others. The researchers use easily-collected data from the hardware counters on each chip to determine which cores need more bandwidth. “By better distributing the bandwidth to the appropriate cores, the criteria are able to maximize system performance,” Solihin says.
“The second technique relies on a set of criteria we developed for determining when prefetching will boost performance and should be utilized,” Solihin says, “as well as when prefetching would slow things down and should be avoided.” These criteria also use data from each chip’s hardware counters. The prefetching criteria would allow manufacturers to make multi-core chips that operate more efficiently, because each of the individual cores would automatically turn prefetching on or off as needed.
Utilizing both sets of criteria, the researchers were able to boost multi-core chip performance by 40 percent, compared to multi-core chips that do not prefetch data, and by 10 percent over multi-core chips that always prefetch data.
The paper, “Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors,” will be presented June 9 at the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) in San Jose, Calif. The paper was co-authored by Dr. Fang Liu, a former Ph.D. student at NC State. The research was supported, in part, by the National Science Foundation.
NC State’s Department of Electrical and Computer Engineering is part of the university’s College of Engineering.
Note to Editors: The study abstract follows.
“Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors”
Authors: Fang Liu and Yan Solihin, North Carolina State University
Presented: June 9, 2011, at the International Conference on Measurement and Modeling of Computer Systems, San Jose, Calif.
Abstract: Modern high performance microprocessors widely employ hardware prefetching to hide long memory access latency. While useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly limited by the availability of off-chip pin bandwidth in Chip Multi-Processors (CMPs). In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact CMP system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into account prefetching effects, and a derivation of the weighted speedup-optimum bandwidth partition sizes for different cores. Through model-driven case studies, we find several interesting observations that can be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance can be achieved by selective prefetching, guided by the composite prefetching metric, coupled with dynamic bandwidth partitioning.
Matt Shipman | EurekAlert!
Cutting edge research for the industries of tomorrow – DFKI and NICT expand cooperation
21.03.2017 | Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, DFKI
Molecular motor-powered biocomputers
20.03.2017 | Technische Universität Dresden
Astronomers from Bonn and Tautenburg in Thuringia (Germany) used the 100-m radio telescope at Effelsberg to observe several galaxy clusters. At the edges of these large accumulations of dark matter, stellar systems (galaxies), hot gas, and charged particles, they found magnetic fields that are exceptionally ordered over distances of many million light years. This makes them the most extended magnetic fields in the universe known so far.
The results will be published on March 22 in the journal „Astronomy & Astrophysics“.
Galaxy clusters are the largest gravitationally bound structures in the universe. With a typical extent of about 10 million light years, i.e. 100 times the...
Researchers at the Goethe University Frankfurt, together with partners from the University of Tübingen in Germany and Queen Mary University as well as Francis Crick Institute from London (UK) have developed a novel technology to decipher the secret ubiquitin code.
Ubiquitin is a small protein that can be linked to other cellular proteins, thereby controlling and modulating their functions. The attachment occurs in many...
In the eternal search for next generation high-efficiency solar cells and LEDs, scientists at Los Alamos National Laboratory and their partners are creating...
Silicon nanosheets are thin, two-dimensional layers with exceptional optoelectronic properties very similar to those of graphene. Albeit, the nanosheets are less stable. Now researchers at the Technical University of Munich (TUM) have, for the first time ever, produced a composite material combining silicon nanosheets and a polymer that is both UV-resistant and easy to process. This brings the scientists a significant step closer to industrial applications like flexible displays and photosensors.
Silicon nanosheets are thin, two-dimensional layers with exceptional optoelectronic properties very similar to those of graphene. Albeit, the nanosheets are...
Enzymes behave differently in a test tube compared with the molecular scrum of a living cell. Chemists from the University of Basel have now been able to simulate these confined natural conditions in artificial vesicles for the first time. As reported in the academic journal Small, the results are offering better insight into the development of nanoreactors and artificial organelles.
Enzymes behave differently in a test tube compared with the molecular scrum of a living cell. Chemists from the University of Basel have now been able to...
20.03.2017 | Event News
14.03.2017 | Event News
07.03.2017 | Event News
24.03.2017 | Materials Sciences
24.03.2017 | Physics and Astronomy
24.03.2017 | Physics and Astronomy