Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Researchers Develop Techniques for Computing Google-Style Web Rankingsup to Five Times Faster

14.05.2003


Speed-up may make "topic-sensitive" page rankings feasible

Computer science researchers at Stanford University have developed several new techniques that together may make it possible to calculate Web page rankings as used in the Google search engine up to five times faster. The speed-ups to Google’s method may make it realistic to calculate page rankings personalized for an individual’s interests or customized to a particular topic.

The Stanford team includes graduate students Sepandar Kamvar and Taher Haveliwala, noted numerical analyst Gene Golub and computer science professor Christopher Manning. They will present their first paper at the Twelfth Annual World Wide Web Conference (WWW2003) in Budapest, Hungary, May 20-24, 2003. The work was supported by the National Science Foundation (NSF), an independent federal agency that supports fundamental research and education in all fields of science and engineering.

Computing PageRank, the ranking algorithm behind the Google search engine, for a billion Web pages can take several days. Google currently ranks and searches 3 billion Web pages. Each personalized or topic-sensitive ranking would require a separate multi-day computation, but the payoff would be less time spent wading through irrelevant search results. For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.

"This work is a wonderful example of how NSF support for basic computer science research, including applied mathematics and algorithm research, has impacts in daily life," said NSF program officer Maria Zemankova. In the mid-1990s, an NSF digital library project and an NSF graduate fellowship also supported Stanford graduate students Larry Page and Sergey Brin while they developed what would become the Google search engine.

To speed up PageRank, the Stanford team developed a trio of techniques in numerical linear algebra. First, in the WWW2003 paper, they describe so-called "extrapolation" methods, which make some assumptions about the Web’s link structure that aren’t true, but permit a quick and easy computation of PageRank. Because the assumptions aren’t true, the PageRank isn’t exactly correct, but it’s close and can be refined using the original PageRank algorithm. The Stanford researchers have shown that their extrapolation techniques can speed up PageRank by 50 percent in realistic conditions and by up to 300 percent under less realistic conditions.

A second paper describes an enhancement, called "BlockRank," which relies on a feature of the Web’s link structure-a feature that the Stanford team is among the first to investigate and exploit. Namely, they show that approximately 80 percent of the pages on any given Web site point to other pages on the same site. As a result, they can compute many single-site PageRanks, glue them together in an appropriate manner and use that as a starting point for the original PageRank algorithm. With this technique, they can realistically speed up the PageRank computation by 300 percent.

Finally, the team notes in a third paper that the rankings for some pages are calculated early in the PageRank process, while the rankings of many highly rated pages take much longer to compute. In a method called "Adaptive PageRank," they eliminate redundant computations associated with those pages whose PageRanks finish early. This speeds up the PageRank computation by up to 50 percent.

"Further speed-ups are possible when we use all these methods," Kamvar said. "Our preliminary experiments show that combining the methods will make the computation of PageRank up to a factor of five faster. However, there are still several issues to be solved. We’re closer to a topic-based PageRank than to a personalized ranking."

The complexities of a personalized ranking would require even greater speed-ups to the PageRank calculations. In addition, while a faster algorithm shortens computation time, the issue of storage remains. Because the results from a single PageRank computation on a few billion Web pages require several gigabytes of storage, saving a personalized PageRank for many individuals would rapidly consume vast amounts of storage. Saving a limited number of topic-specific PageRank calculations would be more practical.

The reason for the expensive computation and storage requirements lies in how PageRank generates the rankings that have led to Google’s popularity. Unlike page-ranking methods that rate each page separately, PageRank bases each page’s "importance" on the number and importance of pages that link to the page.

Therefore, PageRank must consider all pages at the same time and can’t easily omit pages that aren’t likely to be relevant to a topic. It also means that the faster method will not affect how quickly Google presents results to users’ searches, because the rankings are computed in advance and not at the time a search is requested.

The Stanford team’s conference paper and technical reports on enhancing the PageRank algorithm, as well as the original paper describing the PageRank method, are available on the Stanford Database Group’s Publication Server (http://dbpubs.stanford.edu/).

David Hart | National Science Foundation
Further information:
http://www.stanford.edu/~sdkamvar/research.html
http://www.www2003.org/
http://dbpubs.stanford.edu

More articles from Information Technology:

nachricht The TU Ilmenau develops tomorrow’s chip technology today
27.04.2017 | Technische Universität Ilmenau

nachricht Five developments for improved data exploitation
19.04.2017 | Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, DFKI

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Making lightweight construction suitable for series production

More and more automobile companies are focusing on body parts made of carbon fiber reinforced plastics (CFRP). However, manufacturing and repair costs must be further reduced in order to make CFRP more economical in use. Together with the Volkswagen AG and five other partners in the project HolQueSt 3D, the Laser Zentrum Hannover e.V. (LZH) has developed laser processes for the automatic trimming, drilling and repair of three-dimensional components.

Automated manufacturing processes are the basis for ultimately establishing the series production of CFRP components. In the project HolQueSt 3D, the LZH has...

Im Focus: Wonder material? Novel nanotube structure strengthens thin films for flexible electronics

Reflecting the structure of composites found in nature and the ancient world, researchers at the University of Illinois at Urbana-Champaign have synthesized thin carbon nanotube (CNT) textiles that exhibit both high electrical conductivity and a level of toughness that is about fifty times higher than copper films, currently used in electronics.

"The structural robustness of thin metal films has significant importance for the reliable operation of smart skin and flexible electronics including...

Im Focus: Deep inside Galaxy M87

The nearby, giant radio galaxy M87 hosts a supermassive black hole (BH) and is well-known for its bright jet dominating the spectrum over ten orders of magnitude in frequency. Due to its proximity, jet prominence, and the large black hole mass, M87 is the best laboratory for investigating the formation, acceleration, and collimation of relativistic jets. A research team led by Silke Britzen from the Max Planck Institute for Radio Astronomy in Bonn, Germany, has found strong indication for turbulent processes connecting the accretion disk and the jet of that galaxy providing insights into the longstanding problem of the origin of astrophysical jets.

Supermassive black holes form some of the most enigmatic phenomena in astrophysics. Their enormous energy output is supposed to be generated by the...

Im Focus: A Quantum Low Pass for Photons

Physicists in Garching observe novel quantum effect that limits the number of emitted photons.

The probability to find a certain number of photons inside a laser pulse usually corresponds to a classical distribution of independent events, the so-called...

Im Focus: Microprocessors based on a layer of just three atoms

Microprocessors based on atomically thin materials hold the promise of the evolution of traditional processors as well as new applications in the field of flexible electronics. Now, a TU Wien research team led by Thomas Müller has made a breakthrough in this field as part of an ongoing research project.

Two-dimensional materials, or 2D materials for short, are extremely versatile, although – or often more precisely because – they are made up of just one or a...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Fighting drug resistant tuberculosis – InfectoGnostics meets MYCO-NET² partners in Peru

28.04.2017 | Event News

Expert meeting “Health Business Connect” will connect international medical technology companies

20.04.2017 | Event News

Wenn der Computer das Gehirn austrickst

18.04.2017 | Event News

 
Latest News

Wireless power can drive tiny electronic devices in the GI tract

28.04.2017 | Medical Engineering

Ice cave in Transylvania yields window into region's past

28.04.2017 | Earth Sciences

Nose2Brain – Better Therapy for Multiple Sclerosis

28.04.2017 | Life Sciences

VideoLinks
B2B-VideoLinks
More VideoLinks >>>