Statistical technique helps researchers gain more information from a single data run
Brani Vidakovic and Graduate Student Ben Shi demonstrate pupil diameter measuring system. The resulting data is analyzed using wavelet bootstrapping.
For certain classes of data that may be very expensive or difficult to obtain, a new statistical technique may provide useful information from a single data run by allowing meaningful re-sampling.
The technique, known as "wavelet bootstrapping" or "wavestrapping," has applications in the geophysical sciences, bioinformatics, medical imaging, nanotechnology and other areas. It can also be useful for rapidly obtaining information from small data sets in such applications as medical diagnostics.
Wavelets are mathematical functions that have become increasingly important to researchers because of their ability to analyze data sets that are difficult to understand using traditional techniques such as Fast Fourier Transform. For instance, signals within noisy data recorded in the time domain can become more meaningful when analyzed in the wavelet domain.
Wavestrapping was pioneered by University of Washington researchers, who applied wavelet transforms to an established statistical re-sampling technique known as bootstrapping, which is used to extract additional information from single data runs. The marriage of bootstrapping and wavelets offers a new tool for the analysis of data sets that would otherwise be difficult to study because of correlation and time-dependency issues.
"The new thing here is re-sampling, but not in the time domain, which would be nearly impossible because of the strong dependence of data or correlation of data," said Brani Vidakovic, associate professor at the Georgia Institute of Technologys School of Industrial and Systems Engineering. "By transferring the data to the wavelet domain, applying re-sampling methods and then returning the re-sampled data as variants in the time domain, you can then proceed as if you had a data ensemble rather than a single run."
Vidakovic will discuss his research on validating wavelet bootstrapping strategies and assessing their variability bounds at the annual meeting of the American Association for the Advancement of Science (AAAS) in Seattle. His presentation "What Does a Single Run Tell about the Ensemble?" will be part of a session "Wavelet-Based Statistical Analysis of Multiscale Geophysical Data" to be held on February 16.
"Sometimes scientists have a single measurement and they are unable to get another measurement," Vidakovic explained. "Sometimes they would like to have an ensemble of measurements with similar boundary conditions so the heterogeneity caused by external factors – such as different regimes, times of day or climate conditions – are taken into account. Wavestrapping can help make inferences from a single run."
One example might be a study of atmospheric turbulence in which an additional flight to gather data under similar conditions could be impossible. "Atmospheric scientists are very excited about wavelets because not only are they local and able to efficiently describe organized structures in turbulence, but they are also able to assess the self-similarity and scaling indices of turbulence," Vidakovic said.
In such instances, converting the data into a wavelet domain before re-sampling can produce information for which error bounds can be reliably assessed, Vidakovic said. Though the bootstrapping technique is controversial, he believes it offers important opportunities when used with appropriate data sets.
"This is very effective when data in the time domain are not good for bootstrapping because of dependency," he said. "It can solve one difficult problem, and in that respect it is new and exciting."
Wavestrapping was proposed and developed by Don Percival and other researchers at the University of Washingtons Applied Physics Lab. Vidokovics research, sponsored by the National Science Foundation, builds on that work in assessing the techniques validity and where its use is appropriate.
Some examples of wavestrapping applications include:
- Rapid analysis of changes in pupil diameter to reveal clues about the health of patients. Using measurements taken 21 times per second, Vidakovic is helping Georgia Tech researchers Julie Jacko and Francois Sainfort analyze data that may provide quick detection of specific medical conditions.
- Statistical study of new types of nanometer-scale materials. "Nano materials science is increasingly multi-scale because people are looking at the problem at different scales," said Vidakovic. "The modeling should therefore be done at different scales because the materials are very different at the different scales."
- Analysis of genomic data, especially in the rapid determination of which genetic sequences are coding in which are not.
- Medical imaging, such as the detection of details in mammography data where small differences in calcification shapes are important to diagnosis.
Wavelets offer advantages over traditional statistical analysis techniques, including:
- Ability to remove noise from complex data sets;
- Sensitivity to the fractal nature and self-similarity of data;
- Ability to minimize correlation and time-dependency of data;
- Locality of the analysis and ability to handle multi-scale information; and
- Computational simplicity, which permits faster analysis.
Although the beginnings of wavelets can be traced back almost a century, their wide use began only about 15 years ago when new wavelet bases were discovered and their implementation was connected with fast-filtering computational procedures.
"The interest in wavelets is their speed and locality," said Vidakovic. "Locality is the most important, because many natural phenomena are non-stationary and very local. Wavelets are able to economically describe phenomena that are inhomogeneous. For some phenomena, it would be impossible to make sense of the data without wavelets."
Wavelets also help researchers with a major problem of the computer age – large volumes of data mixed with noise. "Their dimension reduction and ability to deal with huge data sets are also strengths of wavelets," he added. "Very nasty data can be de-noised almost in real-time by selecting a few of the important wavelet coefficients that can retain the main trend in the signal."
Many different wavelets exist, and selecting the right ones is a vital part of developing the new technique, Vidakovic said. "Wavelets are not a miracle tool for everything," he warned. "But if the data are amenable to wavelet analysis, then they can be very helpful."
John Toon | EurekAlert!