Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Case researchers discover methods to find ’needles in haystack’ in data

08.12.2005


Create powerful statistical techniques to detect signals



A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.
Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.



Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

Susan Griffith | EurekAlert!
Further information:
http://www.case.edu

More articles from Physics and Astronomy:

nachricht Mars 2020 mission to use smart methods to seek signs of past life
17.08.2017 | Goldschmidt Conference

nachricht Gold shines through properties of nano biosensors
17.08.2017 | American Institute of Physics

All articles from Physics and Astronomy >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Fizzy soda water could be key to clean manufacture of flat wonder material: Graphene

Whether you call it effervescent, fizzy, or sparkling, carbonated water is making a comeback as a beverage. Aside from quenching thirst, researchers at the University of Illinois at Urbana-Champaign have discovered a new use for these "bubbly" concoctions that will have major impact on the manufacturer of the world's thinnest, flattest, and one most useful materials -- graphene.

As graphene's popularity grows as an advanced "wonder" material, the speed and quality at which it can be manufactured will be paramount. With that in mind,...

Im Focus: Exotic quantum states made from light: Physicists create optical “wells” for a super-photon

Physicists at the University of Bonn have managed to create optical hollows and more complex patterns into which the light of a Bose-Einstein condensate flows. The creation of such highly low-loss structures for light is a prerequisite for complex light circuits, such as for quantum information processing for a new generation of computers. The researchers are now presenting their results in the journal Nature Photonics.

Light particles (photons) occur as tiny, indivisible portions. Many thousands of these light portions can be merged to form a single super-photon if they are...

Im Focus: Circular RNA linked to brain function

For the first time, scientists have shown that circular RNA is linked to brain function. When a RNA molecule called Cdr1as was deleted from the genome of mice, the animals had problems filtering out unnecessary information – like patients suffering from neuropsychiatric disorders.

While hundreds of circular RNAs (circRNAs) are abundant in mammalian brains, one big question has remained unanswered: What are they actually good for? In the...

Im Focus: RAVAN CubeSat measures Earth's outgoing energy

An experimental small satellite has successfully collected and delivered data on a key measurement for predicting changes in Earth's climate.

The Radiometer Assessment using Vertically Aligned Nanotubes (RAVAN) CubeSat was launched into low-Earth orbit on Nov. 11, 2016, in order to test new...

Im Focus: Scientists shine new light on the “other high temperature superconductor”

A study led by scientists of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD) at the Center for Free-Electron Laser Science in Hamburg presents evidence of the coexistence of superconductivity and “charge-density-waves” in compounds of the poorly-studied family of bismuthates. This observation opens up new perspectives for a deeper understanding of the phenomenon of high-temperature superconductivity, a topic which is at the core of condensed matter research since more than 30 years. The paper by Nicoletti et al has been published in the PNAS.

Since the beginning of the 20th century, superconductivity had been observed in some metals at temperatures only a few degrees above the absolute zero (minus...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Call for Papers – ICNFT 2018, 5th International Conference on New Forming Technology

16.08.2017 | Event News

Sustainability is the business model of tomorrow

04.08.2017 | Event News

Clash of Realities 2017: Registration now open. International Conference at TH Köln

26.07.2017 | Event News

 
Latest News

Gold shines through properties of nano biosensors

17.08.2017 | Physics and Astronomy

Greenland ice flow likely to speed up: New data assert glaciers move over sediment, which gets more slippery as it gets wetter

17.08.2017 | Earth Sciences

Mars 2020 mission to use smart methods to seek signs of past life

17.08.2017 | Physics and Astronomy

VideoLinks
B2B-VideoLinks
More VideoLinks >>>