Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Buzzwords of history, revealed by computer scans, indicate new ways of searching the Web

19.02.2003


In the years after the American Revolution, U.S. presidents were talking about the British a lot, and then about militias, France and Spain. In the mid-19th century, words like "emancipation," "slaves" and "rebellion" popped up in their speeches. In the early 20th century, presidents started using a lot of business-expansion words, soon to be replaced by "depression."



A couple of decades later they spoke of atoms and communism. By the 1990s, buzzwords prevailed.

Jon Kleinberg, a professor of computer science at Cornell University, Ithaca, N.Y., has developed a method for a computer to find the topics that dominate a discussion at a particular time by scanning large collections of documents for sudden, rapid bursts of words. Among other tests of the method, he scanned presidential State of the Union addresses from 1790 to the present and created a list of words that eerily reflects historical trends. The technique, he suggests, could have many "data mining" applications, including searching the Web or studying trends in society as reflected in Web pages.


Kleinberg will emphasize the Web applications of his searching technique in a talk, "Web Structure and the Design of Search Algorithms," at the annual meeting of the American Association for the Advancement of Science (AAAS) in Denver on Feb. 18. He is taking part in a symposium on "Modeling the Internet and the World Wide Web"

Kleinberg says he got the idea of searching over time while trying to deal with his own flood of incoming e-mail. He reasoned that when an important topic comes up for discussion, keywords related to the topic will show a sudden increase in frequency. A search for these words that suddenly appear more often might, he theorized, provide ways to categorize messages.


He devised a search algorithm that looks for "burstiness," measuring not just the number of times words appear, but the rate of increase in those numbers over time. Programs based on his algorithm can scan text that varies with time and flag the most "bursty" words. "The method is motivated by probability models used to analyze the behavior of communication networks, where burstiness occurs in the traffic due to congestion and hot spots," he explains.

In his own e-mail -- largely from other computer scientists -- he quickly found keywords relating to hot topics. In mail from students he found bursts in the word "prelim" shortly before each midterm exam. Later, he tried the same technique on the texts of State of the Union addresses, all of which are available on the Web, from Washington in 1790 through George W. Bush in 2002. From these speeches he produced a long list of words (see attached table) that summarizes American politics from early revolutionary fervor up to the age of the modern speechwriter.

While we already know about these trends in American history, Kleinberg points out, a computer doesn’t, and it has found these ideas just by scanning raw text. So such a technique should work just as well on historical records in obscure situations where we have no idea what the important terms or keywords are. It might even be used to screen e-mail "chatter" by terrorists. Sociologists, Kleinberg adds, may find it interesting to look for trends in personal Web logs popularly known as "blogs."

For searching the Web, Kleinberg suggests, such a technique could help zero in on what a searcher wants by recognizing the time context of such material as news stories. For instance, he says, a person searching for the word "sniper" today is likely to be looking for information about the recent attacks around the nation’s capital -- but the same search nearly four decades ago might have come from someone interested in the Kennedy assassination.

In his AAAS talk Kleinberg also explores other Web-searching techniques. A few years ago, he suggested that a way to find the most useful Web sites on a particular subject would be to look at the way they are linked to one another. Sites that are "linked to" by many others are probably "authorities." Sites that link to many others are likely to be "hubs." The most authoritative sites on a topic would be the ones that are linked to most often by the most active hubs, he reasoned. A variation on this idea is used by Google, and a more formal version is being used in a new search engine called Teoma http://www.teoma.com .

Kleinberg and others have found that despite its anarchy, there is a great deal of "self-organization" on the Web. In a variation on the "six degrees of separation" idea, Kleinberg says, almost every site on the Web can be reached from almost any other through a series of steps. The structure seems to be a bit like the Milky Way galaxy, with a very dense "core" of heavily interconnected sites surrounded by less dense regions. Nodes outside the core are divided into three categories: "upstream" nodes that link to the core but cannot be reached from it; "downstream" nodes that can be reached from the core but don’t link back to it; and isolated "tendrils" that are not linked directly to the core at all.

Within this structure there are many "communities" of sites representing common interests that are extensively linked to one another. So, Kleinberg suggests, searches might be done by following along the link paths from one site to another, as well as just scanning an index of everything.

"Deeper analysis, exposing the structure of communities embedded in the Web, raises the prospect of bringing together individuals with common interests and lowering barriers to communication," Kleinberg concludes.

Bill Steele | Cornell News
Further information:
http://www.news.cornell.edu/releases/Feb03/AAAS.Kleinberg.bursty.ws.html
http://www.cs.cornell.edu/home/kleinber/
http://www.teoma.com

More articles from Information Technology:

nachricht Open source software helps researchers extract key insights from huge sensor datasets
22.03.2019 | Universität des Saarlandes

nachricht Touchscreens go 3D with buttons that pulsate and vibrate under your fingertips
14.03.2019 | Universität des Saarlandes

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: The taming of the light screw

DESY and MPSD scientists create high-order harmonics from solids with controlled polarization states, taking advantage of both crystal symmetry and attosecond electronic dynamics. The newly demonstrated technique might find intriguing applications in petahertz electronics and for spectroscopic studies of novel quantum materials.

The nonlinear process of high-order harmonic generation (HHG) in gases is one of the cornerstones of attosecond science (an attosecond is a billionth of a...

Im Focus: Magnetic micro-boats

Nano- and microtechnology are promising candidates not only for medical applications such as drug delivery but also for the creation of little robots or flexible integrated sensors. Scientists from the Max Planck Institute for Polymer Research (MPI-P) have created magnetic microparticles, with a newly developed method, that could pave the way for building micro-motors or guiding drugs in the human body to a target, like a tumor. The preparation of such structures as well as their remote-control can be regulated using magnetic fields and therefore can find application in an array of domains.

The magnetic properties of a material control how this material responds to the presence of a magnetic field. Iron oxide is the main component of rust but also...

Im Focus: Self-healing coating made of corn starch makes small scratches disappear through heat

Due to the special arrangement of its molecules, a new coating made of corn starch is able to repair small scratches by itself through heat: The cross-linking via ring-shaped molecules makes the material mobile, so that it compensates for the scratches and these disappear again.

Superficial micro-scratches on the car body or on other high-gloss surfaces are harmless, but annoying. Especially in the luxury segment such surfaces are...

Im Focus: Stellar cartography

The Potsdam Echelle Polarimetric and Spectroscopic Instrument (PEPSI) at the Large Binocular Telescope (LBT) in Arizona released its first image of the surface magnetic field of another star. In a paper in the European journal Astronomy & Astrophysics, the PEPSI team presents a Zeeman- Doppler-Image of the surface of the magnetically active star II Pegasi.

A special technique allows astronomers to resolve the surfaces of faraway stars. Those are otherwise only seen as point sources, even in the largest telescopes...

Im Focus: Heading towards a tsunami of light

Researchers at Chalmers University of Technology and the University of Gothenburg, Sweden, have proposed a way to create a completely new source of radiation. Ultra-intense light pulses consist of the motion of a single wave and can be described as a tsunami of light. The strong wave can be used to study interactions between matter and light in a unique way. Their research is now published in the scientific journal Physical Review Letters.

"This source of radiation lets us look at reality through a new angle - it is like twisting a mirror and discovering something completely different," says...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

International Modelica Conference with 330 visitors from 21 countries at OTH Regensburg

11.03.2019 | Event News

Selection Completed: 580 Young Scientists from 88 Countries at the Lindau Nobel Laureate Meeting

01.03.2019 | Event News

LightMAT 2019 – 3rd International Conference on Light Materials – Science and Technology

28.02.2019 | Event News

 
Latest News

Bacteria may travel thousands of miles through the air globally

25.03.2019 | Life Sciences

Key evidence associating hydrophobicity with effective acid catalysis

25.03.2019 | Life Sciences

Drug diversity in bacteria

25.03.2019 | Life Sciences

VideoLinks
Science & Research
Overview of more VideoLinks >>>