Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Case researchers discover methods to find ’needles in haystack’ in data

08.12.2005


Create powerful statistical techniques to detect signals



A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.
Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.



Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

Susan Griffith | EurekAlert!
Further information:
http://www.case.edu

More articles from Physics and Astronomy:

nachricht Airborne thermometer to measure Arctic temperatures
11.01.2017 | Moscow Institute of Physics and Technology

nachricht Next-generation optics offer the widest real-time views of vast regions of the sun
11.01.2017 | New Jersey Institute of Technology

All articles from Physics and Astronomy >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Designing Architecture with Solar Building Envelopes

Among the general public, solar thermal energy is currently associated with dark blue, rectangular collectors on building roofs. Technologies are needed for aesthetically high quality architecture which offer the architect more room for manoeuvre when it comes to low- and plus-energy buildings. With the “ArKol” project, researchers at Fraunhofer ISE together with partners are currently developing two façade collectors for solar thermal energy generation, which permit a high degree of design flexibility: a strip collector for opaque façade sections and a solar thermal blind for transparent sections. The current state of the two developments will be presented at the BAU 2017 trade fair.

As part of the “ArKol – development of architecturally highly integrated façade collectors with heat pipes” project, Fraunhofer ISE together with its partners...

Im Focus: How to inflate a hardened concrete shell with a weight of 80 t

At TU Wien, an alternative for resource intensive formwork for the construction of concrete domes was developed. It is now used in a test dome for the Austrian Federal Railways Infrastructure (ÖBB Infrastruktur).

Concrete shells are efficient structures, but not very resource efficient. The formwork for the construction of concrete domes alone requires a high amount of...

Im Focus: Bacterial Pac Man molecule snaps at sugar

Many pathogens use certain sugar compounds from their host to help conceal themselves against the immune system. Scientists at the University of Bonn have now, in cooperation with researchers at the University of York in the United Kingdom, analyzed the dynamics of a bacterial molecule that is involved in this process. They demonstrate that the protein grabs onto the sugar molecule with a Pac Man-like chewing motion and holds it until it can be used. Their results could help design therapeutics that could make the protein poorer at grabbing and holding and hence compromise the pathogen in the host. The study has now been published in “Biophysical Journal”.

The cells of the mouth, nose and intestinal mucosa produce large quantities of a chemical called sialic acid. Many bacteria possess a special transport system...

Im Focus: Newly proposed reference datasets improve weather satellite data quality

UMD, NOAA collaboration demonstrates suitability of in-orbit datasets for weather satellite calibration

"Traffic and weather, together on the hour!" blasts your local radio station, while your smartphone knows the weather halfway across the world. A network of...

Im Focus: Repairing defects in fiber-reinforced plastics more efficiently

Fiber-reinforced plastics (FRP) are frequently used in the aeronautic and automobile industry. However, the repair of workpieces made of these composite materials is often less profitable than exchanging the part. In order to increase the lifetime of FRP parts and to make them more eco-efficient, the Laser Zentrum Hannover e.V. (LZH) and the Apodius GmbH want to combine a new measuring device for fiber layer orientation with an innovative laser-based repair process.

Defects in FRP pieces may be production or operation-related. Whether or not repair is cost-effective depends on the geometry of the defective area, the tools...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

12V, 48V, high-voltage – trends in E/E automotive architecture

10.01.2017 | Event News

2nd Conference on Non-Textual Information on 10 and 11 May 2017 in Hannover

09.01.2017 | Event News

Nothing will happen without batteries making it happen!

05.01.2017 | Event News

 
Latest News

Solar Collectors from Ultra-High Performance Concrete Combine Energy Efficiency and Aesthetics

16.01.2017 | Trade Fair News

3D scans for the automotive industry

16.01.2017 | Automotive Engineering

Nanoparticle Exposure Can Awaken Dormant Viruses in the Lungs

16.01.2017 | Life Sciences

VideoLinks
B2B-VideoLinks
More VideoLinks >>>