Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Case researchers discover methods to find ’needles in haystack’ in data

08.12.2005


Create powerful statistical techniques to detect signals



A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.
Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.



Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

Susan Griffith | EurekAlert!
Further information:
http://www.case.edu

More articles from Physics and Astronomy:

nachricht A 100-year-old physics problem has been solved at EPFL
23.06.2017 | Ecole Polytechnique Fédérale de Lausanne

nachricht Quantum thermometer or optical refrigerator?
23.06.2017 | National Institute of Standards and Technology (NIST)

All articles from Physics and Astronomy >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Can we see monkeys from space? Emerging technologies to map biodiversity

An international team of scientists has proposed a new multi-disciplinary approach in which an array of new technologies will allow us to map biodiversity and the risks that wildlife is facing at the scale of whole landscapes. The findings are published in Nature Ecology and Evolution. This international research is led by the Kunming Institute of Zoology from China, University of East Anglia, University of Leicester and the Leibniz Institute for Zoo and Wildlife Research.

Using a combination of satellite and ground data, the team proposes that it is now possible to map biodiversity with an accuracy that has not been previously...

Im Focus: Climate satellite: Tracking methane with robust laser technology

Heatwaves in the Arctic, longer periods of vegetation in Europe, severe floods in West Africa – starting in 2021, scientists want to explore the emissions of the greenhouse gas methane with the German-French satellite MERLIN. This is made possible by a new robust laser system of the Fraunhofer Institute for Laser Technology ILT in Aachen, which achieves unprecedented measurement accuracy.

Methane is primarily the result of the decomposition of organic matter. The gas has a 25 times greater warming potential than carbon dioxide, but is not as...

Im Focus: How protons move through a fuel cell

Hydrogen is regarded as the energy source of the future: It is produced with solar power and can be used to generate heat and electricity in fuel cells. Empa researchers have now succeeded in decoding the movement of hydrogen ions in crystals – a key step towards more efficient energy conversion in the hydrogen industry of tomorrow.

As charge carriers, electrons and ions play the leading role in electrochemical energy storage devices and converters such as batteries and fuel cells. Proton...

Im Focus: A unique data centre for cosmological simulations

Scientists from the Excellence Cluster Universe at the Ludwig-Maximilians-Universität Munich have establised "Cosmowebportal", a unique data centre for cosmological simulations located at the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences. The complete results of a series of large hydrodynamical cosmological simulations are available, with data volumes typically exceeding several hundred terabytes. Scientists worldwide can interactively explore these complex simulations via a web interface and directly access the results.

With current telescopes, scientists can observe our Universe’s galaxies and galaxy clusters and their distribution along an invisible cosmic web. From the...

Im Focus: Scientists develop molecular thermometer for contactless measurement using infrared light

Temperature measurements possible even on the smallest scale / Molecular ruby for use in material sciences, biology, and medicine

Chemists at Johannes Gutenberg University Mainz (JGU) in cooperation with researchers of the German Federal Institute for Materials Research and Testing (BAM)...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Plants are networkers

19.06.2017 | Event News

Digital Survival Training for Executives

13.06.2017 | Event News

Global Learning Council Summit 2017

13.06.2017 | Event News

 
Latest News

Quantum thermometer or optical refrigerator?

23.06.2017 | Physics and Astronomy

A 100-year-old physics problem has been solved at EPFL

23.06.2017 | Physics and Astronomy

Equipping form with function

23.06.2017 | Information Technology

VideoLinks
B2B-VideoLinks
More VideoLinks >>>