A little known secret in data mining is that simply feeding raw data into a data analysis algorithm is unlikely to produce meaningful results, say the authors of a new Cornell University study.
From recognizing speech to identifying unusual stars, new discoveries often begin with comparison of data streams to find connections and spot outliers.
But most data comparison algorithms today have one major weakness – somewhere, they rely on a human expert to specify what aspects of the data are relevant for comparison, and what aspects aren't. But experts aren't keeping pace with the growing amounts and complexities of big data.
Cornell computing researchers have come up with a new principle they call "data smashing" for estimating the similarities between streams of arbitrary data without human intervention, and without access to the data sources. Hod Lipson, associate professor of mechanical engineering and computing and information science, and Ishanu Chattopadhyay, a former postdoctoral associate with Lipson and now at the University of Chicago, have described their method in Royal Society Interface, Oct. 1.
Data smashing is based on a new way to compare data streams. The process involves two steps. First, the data streams are algorithmically "smashed" to "annihilate" the information in each other. Then, the process measures what information remained after the collision. The more information remained, the less likely the streams originated in the same source.
Data smashing principles may open the door to understanding increasingly complex observations, especially when experts do not know what to look for, according to the researchers.
The authors demonstrated the application of their principle to data from real-world problems, including the disambiguation of electroencephalograph patterns from epileptic seizure patients; detection of anomalous cardiac activity from heart recordings; and classification of astronomical objects from raw photometry.
In all cases and without access to original domain knowledge, the researchers demonstrated performance on par with the accuracy of specialized algorithms and heuristics devised by experts.
The work in the paper, "Data smashing: Uncovering lurking order in data," was supported by the Defense Advanced Research Projects Agency and the U.S. Army Research Office.
Syl Kacapyr | Eurek Alert!
New study first to predict which oil and gas wells are leaking methane
21.12.2018 | University of Vermont
Droughts boost emissions as hydropower dries up
21.12.2018 | Stanford's School of Earth, Energy & Environmental Sciences
Dead and alive at the same time? Researchers at the Max Planck Institute of Quantum Optics have implemented Erwin Schrödinger’s paradoxical gedanken experiment employing an entangled atom-light state.
In 1935 Erwin Schrödinger formulated a thought experiment designed to capture the paradoxical nature of quantum physics. The crucial element of this gedanken...
Cellulose obtained from wood has amazing material properties. Empa researchers are now equipping the biodegradable material with additional functionalities to produce implants for cartilage diseases using 3D printing.
It all starts with an ear. Empa researcher Michael Hausmann removes the object shaped like a human ear from the 3D printer and explains:
The phenomenon of so-called superlubricity is known, but so far the explanation at the atomic level has been missing: for example, how does extremely low friction occur in bearings? Researchers from the Fraunhofer Institutes IWM and IWS jointly deciphered a universal mechanism of superlubricity for certain diamond-like carbon layers in combination with organic lubricants. Based on this knowledge, it is now possible to formulate design rules for supra lubricating layer-lubricant combinations. The results are presented in an article in Nature Communications, volume 10.
One of the most important prerequisites for sustainable and environmentally friendly mobility is minimizing friction. Research and industry have been dedicated...
Just in time for Christmas, a Mars-analogue mission in Morocco, coordinated by the Robotics Innovation Center of the German Research Center for Artificial Intelligence (DFKI) as part of the SRC project FACILITATORS, has been successfully completed. SRC, the Strategic Research Cluster on Space Robotics Technologies, is a program of the European Union to support research and development in space technologies. From mid-November to mid-December 2018, a team of more than 30 scientists from 11 countries tested technologies for future exploration of Mars and Moon in the desert of the Maghreb state.
Close to the border with Algeria, the Erfoud region in Morocco – known to tourists for its impressive sand dunes – offered ideal conditions for the four-week...
Research opens doors in photonic quantum information processing, optical signal processing and microwave photonics
Researchers from the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) have developed a new integrated photonics platform that can...
16.01.2019 | Event News
14.01.2019 | Event News
12.12.2018 | Event News
17.01.2019 | Life Sciences
16.01.2019 | Life Sciences
16.01.2019 | Physics and Astronomy