During this 2010 “Sort Benchmark” competition – the “World Cup of data sorting” – the computer scientists from the UC San Diego Jacobs School of Engineering also tied a world record for fastest data sorting rate. They sorted one trillion data records in 172 minutes – and did so using just a quarter of the computing resources of the other record holder.
Companies looking for trends, efficiencies and other competitive advantages have turned to the kind of heavy duty data sorting that requires the hardware muscle typical of data centers. The Internet has also created many scenarios where data sorting is critical. Advertisements on Facebook pages, custom recommendations on Amazon, and up-to-the-second search results on Google all result from sorting data sets as large as multiple petabytes. A petabyte is 1,000 terabytes.
“If a major corporation wants to run a query across all of their page views or products sold, that can require a sort across a multi-petabyte dataset and one that is growing by many gigabytes every day,” said UC San Diego computer science professor Amin Vahdat, who led the project. “Companies are pushing the limit on how much data they can sort, and how fast. This is data analytics in real time,” explained Vahdat. Better sort technologies are needed, however. In data centers, sorting is often the most pressing bottleneck in many higher-level activities, noted Vahdat who directs the Center for Networked Systems (CNS) at UC San Diego.
The two new world records from UC San Diego are among the 2010 results released recently on http://sortbenchmark.org – a site run by the volunteer computer scientists from industry and academia who manage the competitions. The competitions provide benchmarks for data sorting and an interactive forum for researchers working to improve data sorting techniques.World Records
“We’ve set our research agenda around how to make this better…and also on how to make it more general,” said UC San Diego computer science PhD student Alex Rasmussen, the lead graduate student on the team.
The team also tied the world record for the “Indy Gray Sort” which measures sort rate per minute per 100 terabytes of data.
“We used one forth the number of computers as the previous record holder to achieve that same sort rate performance – and thus one fourth the energy, and one fourth the cooling and data center real estate,” said George Porter, a Research Scientist at the Center for Networked Systems at UC San Diego. The Center for Networked Systems is an affiliated Center of the California Institute for Telecommunications and Information Technology (Calit2).
Both world records are in the “Indy” category – meaning that the systems were designed around the specific parameters of the Sort Benchmark competition. The team is looking to generalize their results for the “Daytona” competition and for use in the real world.
“Sorting is also an interesting proxy for a whole bunch of other data processing problems. Generally, sorting is a great way to measure how fast you can read a lot of data off a set of disks, do some basic processing on it, shuffle it around a network and write it to another set of disks,” explained Rasmussen. “Sorting puts a lot of stress on the entire input/output subsystem, from the hard drives and the networking hardware to the operating system and application software.”Balanced Systems
In creating their heavy duty sorting system, the computer scientists designed for speed and balance. A balanced system is one in which computing resources like memory, storage and network bandwidth are fully utilized and as few resources as possible are wasted.
1. “Our system shows what’s possible if you pay attention to efficiency – and there is still plenty of room for improvement,” said Vahdat, holder of the SAIC Chair in Engineering in the Department of Computer Science and Engineering at UC San Diego. “We asked ourselves, ‘What does it mean to build a balanced system where we are not wasting any system resources in carrying out high end computation?’” said Vahdat. “If you are idling your processors or not using all your RAM, you’re burning energy and losing efficiency.” For example, memory often uses as much or more energy than processors, but the energy consumed by memory gets less attention.
To break the terabyte barrier for the Indy Minute Sort, the computer science researchers built a system made up of 52 computer nodes. Each node is a commodity server with two quad-core processors, 24 gigabytes (GB) memory and sixteen 500 GB disks – all inter-connected by a Cisco Nexus 5020 switch. Cisco donated the switches as a part of their research engagement with the UC San Diego Center for Networked Systems. The compute cluster is hosted at Calit2.
To win the Indy Gray Sort, the computer science researchers sorted one trillion records in 10,318 seconds (about 172 minutes), yielding their world-record tying data sorting rate of 0.582 terabytes per minute per 100 terabytes of data. The winning sort system is made up of 47 computer nodes similar to those used in the minute sort.
According to wolframalpha.com, 100 terabytes of data is roughly equivalent to 4,000 single-layer Blu-Ray discs, 21,000 single-layer DVDs, 12,000 dual-layer DVDs or 142,248 CDs (assuming CDs are 703 MB).
Daniel Kane | Newswise Science News
Smarter robot vacuum cleaners for automated office cleaning
15.08.2017 | Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO
Researchers 3-D print first truly microfluidic 'lab on a chipl devices
15.08.2017 | Brigham Young University
Whether you call it effervescent, fizzy, or sparkling, carbonated water is making a comeback as a beverage. Aside from quenching thirst, researchers at the University of Illinois at Urbana-Champaign have discovered a new use for these "bubbly" concoctions that will have major impact on the manufacturer of the world's thinnest, flattest, and one most useful materials -- graphene.
As graphene's popularity grows as an advanced "wonder" material, the speed and quality at which it can be manufactured will be paramount. With that in mind,...
Physicists at the University of Bonn have managed to create optical hollows and more complex patterns into which the light of a Bose-Einstein condensate flows. The creation of such highly low-loss structures for light is a prerequisite for complex light circuits, such as for quantum information processing for a new generation of computers. The researchers are now presenting their results in the journal Nature Photonics.
Light particles (photons) occur as tiny, indivisible portions. Many thousands of these light portions can be merged to form a single super-photon if they are...
For the first time, scientists have shown that circular RNA is linked to brain function. When a RNA molecule called Cdr1as was deleted from the genome of mice, the animals had problems filtering out unnecessary information – like patients suffering from neuropsychiatric disorders.
While hundreds of circular RNAs (circRNAs) are abundant in mammalian brains, one big question has remained unanswered: What are they actually good for? In the...
An experimental small satellite has successfully collected and delivered data on a key measurement for predicting changes in Earth's climate.
The Radiometer Assessment using Vertically Aligned Nanotubes (RAVAN) CubeSat was launched into low-Earth orbit on Nov. 11, 2016, in order to test new...
A study led by scientists of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD) at the Center for Free-Electron Laser Science in Hamburg presents evidence of the coexistence of superconductivity and “charge-density-waves” in compounds of the poorly-studied family of bismuthates. This observation opens up new perspectives for a deeper understanding of the phenomenon of high-temperature superconductivity, a topic which is at the core of condensed matter research since more than 30 years. The paper by Nicoletti et al has been published in the PNAS.
Since the beginning of the 20th century, superconductivity had been observed in some metals at temperatures only a few degrees above the absolute zero (minus...
16.08.2017 | Event News
04.08.2017 | Event News
26.07.2017 | Event News
17.08.2017 | Physics and Astronomy
17.08.2017 | Earth Sciences
17.08.2017 | Physics and Astronomy