Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Theoretical computer science provides answers to data privacy problem

08.10.2015

New tools allow researchers to share and study sensitive data safely by applying 'differential privacy'

The promise of big data lies in researchers' ability to mine massive datasets for insights that can save lives, improve services and inform our understanding of the world.


Through a differentially private interface for data analysis, it is impossible for an adversary to extract information that is specific to one individual, no matter how much other information or computing power it has at its disposal.

Credit: Salil Vadhan, Harvard University

These data may be generated by surfing the web, interacting with medical devices or passing sensors. Some data may be trivial, but in many cases, data are deeply personal. They can even influence our insurance premiums or the price we pay for a product online.

When planning a study, data scientists need to balance their desire to uncover new knowledge with the privacy of the people whom the data represent.

"The science of understanding human behavior, health, and interactions is being transformed by the ability of researchers to collect, analyze, and share data about individuals on a wide scale," a team of Harvard University researchers wrote in a July 2014 paper, "Integrating Approaches to Privacy across the Research Lifecycle."

However, the paper continued, "a major challenge for realizing the full potential of such data science is ensuring the privacy of human subjects."

Initially, researchers believed that anonymizing data--erasing the names and replacing them with arbitrary identifiers--was enough to protect the identities and personal information of those who had agreed (knowingly or unknowingly) to contribute information. However, in a well-known study published in 2000, Latanya Sweeney led a team that uncovered the identities of patients, including then Massachusetts governor William Weld, by correlating anonymized data with other publicly available data.

In a more recent case, researchers Arvind Narayanan and Vitaly Shmatikov from The University of Texas at Austin partially de-anonymized a Netflix dataset containing half a million movie reviews. By cross-referencing that dataset with information in the Internet Movie Database, the researchers showed that attackers could potentially identify known users, compromising their data.

As cases of re-identification and de-anonymization emerge, researchers are exploring new, more robust approaches to privacy protection.

Salil Vadhan, a professor of computer science at Harvard University and former director of the Center of Research on Computation and Society, is among the researchers exploring an approach known as "differential privacy" that allows one to investigate data without revealing confidential information about participants. Initially introduced by Cynthia Dwork, Frank McSherry , Kobbi Nissim and Adam Smith, among others, in the mid-2000s, researchers continue to develop the concept today to apply it for real-world problems.

As the lead researcher for the National Science Foundation (NSF) supported "Privacy Tools for Sharing Research Data," Vadhan and his team at Harvard are developing a new computer system that acts as a trusted curator--and identity protector--of sensitive, valuable, data. (The Sloan Foundation and Google, Inc. are providing the project with additional support.)

The system works like this: Researchers ask the virtual curator questions based on the data--for instance, "What percentage of individuals who have Type B blood are also HIV positive?" The computer returns an answer that is approximately accurate, but that includes just enough "noise" that no matter how hard someone tries, they cannot find out anything specific to any individual in the database.

"Even if an adversary tries to target an individual in the dataset, the adversary should not be able to tell the difference between the world as it is and one where that individual's data is entirely removed from the dataset," Vadhan said. "Randomization turns out to be very powerful."

If the system is implemented simply, the level of privacy degrades with multiple queries, so one could keep asking questions until the point where identifying people in the database becomes possible. However, by judiciously increasing the amount of noise and carefully correlating it across queries, the system can maintain privacy protection, even in the face of very large number of questions.

Differential privacy has become a hot topic in recent years. A 2015 Science magazine article referred to differential privacy as one of the most promising technical solutions for protecting the data of students enrolled in Massive Open Online Courses (MOOCs). Projects including OnTheMap, used for U.S. census data, RAPPOR, a new product from Google, apply forms of differential privacy for data sharing.

Speaking at the NSF in early 2015, Vadhan explained how ideas from theoretical computer science inspired the development of differential privacy algorithms, which have are now entering the research ecosystem. Harvard's Institute for Quantitative Social Science is planning to use differential privacy techniques to enable more researchers to share, retain control of, and credit for their data contributions as part of the Dataverse Network, a project that guarantees the long-term preservation of critical datasets.

Unlocked Scientific Potential

Dataverse is the largest public general-purpose research data repository in the world. However, the scientific community could access far more datasets that are currently not publicly available, if differential privacy's promise is fulfilled, according to Gary King, Albert J. Weatherhead III University Professor at Harvard University and Director of the Institute for Quantitative Social Science.

"That's why we're so thrilled to be working on this project," King said. "The social sciences are finally getting to the point in human history where we have enough information to move from studying problems to actually solving them. As we make progress on the privacy problem, we will be able to unlock more and more of the potential of this new information."

The differential privacy tool Vadhan and his team are developing will allow the inclusion of datasets that were previously withheld because the information was too sensitive and privacy was uncertain.

"Currently, Dataverse is not equipped to handle datasets with privacy concerns associated with them," Vadhan said. "If a researcher says that a dataset has identifiable personal information, it is not made available for download."

Differential privacy doesn't work for every type of research question. Vadhan pointed to regression, machine learning, and social network analysis as areas where there are very promising theoretical results, but challenges remain to making differential privacy work well in practice.

Differential privacy also doesn't help when you're looking for identity of a specific individual: as in the case of identifying terrorists or a match for a kidney donor. But that's the point: each individual should be "hidden" even as they contribute to the greater good of any given study.

"This project could significantly enhance the state-of-the-art in privacy," said Nina Amla, a program director at NSF who oversees the award. "They take a highly interdisciplinary approach which brings together deep expertise in computer science, social science, statistics, and law."

According to Vadhan, differential privacy has rich connections with other parts of computer science theory and mathematics.

"It turned out not to just be an island in itself but to be deeply intertwined with other theoretical questions," Vadhan said. "And we're seeing interest from many communities, such as privacy law, medical informatics and social science, to see whether differential privacy can address the privacy problems they think about."

The team hopes to release a preliminary version of their tool for public exploration and feedback this fall and have published their work in the Annals of the American Academy of Political and Social Science and will present their research on differential privacy at many major conferences, including the upcoming 2015 IEEE Symposium on Foundations of Computer Science.

"Our goal in the project is to enable the wider sharing of data while protecting privacy," Vadhan said, "and to make sharing easier for a non-expert researcher without experience in computer science, law or statistics."

Media Contact

Aaron Dubrow
adubrow@nsf.gov
703-292-4489

 @NSF

http://www.nsf.gov 

Aaron Dubrow | EurekAlert!

Further reports about: Dataverse NSF computer science privacy social science

More articles from Information Technology:

nachricht Underwater acoustic localization of marine mammals and vehicles
23.11.2017 | IMDEA Networks Institute

nachricht NASA CubeSat to test miniaturized weather satellite technology
10.11.2017 | NASA/Goddard Space Flight Center

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Frictional Heat Powers Hydrothermal Activity on Enceladus

Computer simulation shows how the icy moon heats water in a porous rock core

Heat from the friction of rocks caused by tidal forces could be the “engine” for the hydrothermal activity on Saturn's moon Enceladus. This presupposes that...

Im Focus: Nanoparticles help with malaria diagnosis – new rapid test in development

The WHO reports an estimated 429,000 malaria deaths each year. The disease mostly affects tropical and subtropical regions and in particular the African continent. The Fraunhofer Institute for Silicate Research ISC teamed up with the Fraunhofer Institute for Molecular Biology and Applied Ecology IME and the Institute of Tropical Medicine at the University of Tübingen for a new test method to detect malaria parasites in blood. The idea of the research project “NanoFRET” is to develop a highly sensitive and reliable rapid diagnostic test so that patient treatment can begin as early as possible.

Malaria is caused by parasites transmitted by mosquito bite. The most dangerous form of malaria is malaria tropica. Left untreated, it is fatal in most cases....

Im Focus: A “cosmic snake” reveals the structure of remote galaxies

The formation of stars in distant galaxies is still largely unexplored. For the first time, astron-omers at the University of Geneva have now been able to closely observe a star system six billion light-years away. In doing so, they are confirming earlier simulations made by the University of Zurich. One special effect is made possible by the multiple reflections of images that run through the cosmos like a snake.

Today, astronomers have a pretty accurate idea of how stars were formed in the recent cosmic past. But do these laws also apply to older galaxies? For around a...

Im Focus: Visual intelligence is not the same as IQ

Just because someone is smart and well-motivated doesn't mean he or she can learn the visual skills needed to excel at tasks like matching fingerprints, interpreting medical X-rays, keeping track of aircraft on radar displays or forensic face matching.

That is the implication of a new study which shows for the first time that there is a broad range of differences in people's visual ability and that these...

Im Focus: Novel Nano-CT device creates high-resolution 3D-X-rays of tiny velvet worm legs

Computer Tomography (CT) is a standard procedure in hospitals, but so far, the technology has not been suitable for imaging extremely small objects. In PNAS, a team from the Technical University of Munich (TUM) describes a Nano-CT device that creates three-dimensional x-ray images at resolutions up to 100 nanometers. The first test application: Together with colleagues from the University of Kassel and Helmholtz-Zentrum Geesthacht the researchers analyzed the locomotory system of a velvet worm.

During a CT analysis, the object under investigation is x-rayed and a detector measures the respective amount of radiation absorbed from various angles....

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

Ecology Across Borders: International conference brings together 1,500 ecologists

15.11.2017 | Event News

Road into laboratory: Users discuss biaxial fatigue-testing for car and truck wheel

15.11.2017 | Event News

#Berlin5GWeek: The right network for Industry 4.0

30.10.2017 | Event News

 
Latest News

Underwater acoustic localization of marine mammals and vehicles

23.11.2017 | Information Technology

Enhancing the quantum sensing capabilities of diamond

23.11.2017 | Physics and Astronomy

Meadows beat out shrubs when it comes to storing carbon

23.11.2017 | Life Sciences

VideoLinks
B2B-VideoLinks
More VideoLinks >>>