A little known secret in data mining is that simply feeding raw data into a data analysis algorithm is unlikely to produce meaningful results, say the authors of a new Cornell University study.
From recognizing speech to identifying unusual stars, new discoveries often begin with comparison of data streams to find connections and spot outliers.
But most data comparison algorithms today have one major weakness – somewhere, they rely on a human expert to specify what aspects of the data are relevant for comparison, and what aspects aren't. But experts aren't keeping pace with the growing amounts and complexities of big data.
Cornell computing researchers have come up with a new principle they call "data smashing" for estimating the similarities between streams of arbitrary data without human intervention, and without access to the data sources. Hod Lipson, associate professor of mechanical engineering and computing and information science, and Ishanu Chattopadhyay, a former postdoctoral associate with Lipson and now at the University of Chicago, have described their method in Royal Society Interface, Oct. 1.
Data smashing is based on a new way to compare data streams. The process involves two steps. First, the data streams are algorithmically "smashed" to "annihilate" the information in each other. Then, the process measures what information remained after the collision. The more information remained, the less likely the streams originated in the same source.
Data smashing principles may open the door to understanding increasingly complex observations, especially when experts do not know what to look for, according to the researchers.
The authors demonstrated the application of their principle to data from real-world problems, including the disambiguation of electroencephalograph patterns from epileptic seizure patients; detection of anomalous cardiac activity from heart recordings; and classification of astronomical objects from raw photometry.
In all cases and without access to original domain knowledge, the researchers demonstrated performance on par with the accuracy of specialized algorithms and heuristics devised by experts.
The work in the paper, "Data smashing: Uncovering lurking order in data," was supported by the Defense Advanced Research Projects Agency and the U.S. Army Research Office.
Syl Kacapyr | Eurek Alert!
Antarctic Ice Sheet mass loss has increased
14.06.2018 | Technische Universität Dresden
WAKE-UP provides new treatment option for stroke patients | International study led by UKE
17.05.2018 | Universitätsklinikum Hamburg-Eppendorf
Moving into its fourth decade, AchemAsia is setting out for new horizons: The International Expo and Innovation Forum for Sustainable Chemical Production will take place from 21-23 May 2019 in Shanghai, China. With an updated event profile, the eleventh edition focusses on topics that are especially relevant for the Chinese process industry, putting a strong emphasis on sustainability and innovation.
Founded in 1989 as a spin-off of ACHEMA to cater to the needs of China’s then developing industry, AchemAsia has since grown into a platform where the latest...
The BMBF-funded OWICELLS project was successfully completed with a final presentation at the BMW plant in Munich. The presentation demonstrated a Li-Fi communication with a mobile robot, while the robot carried out usual production processes (welding, moving and testing parts) in a 5x5m² production cell. The robust, optical wireless transmission is based on spatial diversity; in other words, data is sent and received simultaneously by several LEDs and several photodiodes. The system can transmit data at more than 100 Mbit/s and five milliseconds latency.
Modern production technologies in the automobile industry must become more flexible in order to fulfil individual customer requirements.
An international team of scientists has discovered a new way to transfer image information through multimodal fibers with almost no distortion - even if the fiber is bent. The results of the study, to which scientist from the Leibniz-Institute of Photonic Technology Jena (Leibniz IPHT) contributed, were published on 6thJune in the highly-cited journal Physical Review Letters.
Endoscopes allow doctors to see into a patient’s body like through a keyhole. Typically, the images are transmitted via a bundle of several hundreds of optical...
Light detection and control lies at the heart of many modern device applications, such as smartphone cameras. Using graphene as a light-sensitive material for...
Water molecules exist in two different forms with almost identical physical properties. For the first time, researchers have succeeded in separating the two forms to show that they can exhibit different chemical reactivities. These results were reported by researchers from the University of Basel and their colleagues in Hamburg in the scientific journal Nature Communications.
From a chemical perspective, water is a molecule in which a single oxygen atom is linked to two hydrogen atoms. It is less well known that water exists in two...
13.06.2018 | Event News
08.06.2018 | Event News
05.06.2018 | Event News
18.06.2018 | Earth Sciences
18.06.2018 | Process Engineering
18.06.2018 | Life Sciences