Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

How hard is it to 'de-anonymize' cellphone data?

28.03.2013
A new formula that characterizes the privacy afforded by large, aggregate data sets may be discouraging, but could help sharpen policy discussion.

The proliferation of sensor-studded cellphones could lead to a wealth of data with socially useful applications — in urban planning, epidemiology, operations research and emergency preparedness, among other things.


Rendering by Christine Daniloff/MIT of an original image by Yves-Alexandre de Montjoye et al.

Of course, before being released to researchers, the data would have to be stripped of identifying information. But how hard could it be to protect the identity of one unnamed cellphone user in a data set of hundreds of thousands or even millions?

According to a paper appearing this week in Scientific Reports, harder than you might think. Researchers at MIT and the Université Catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them.

In other words, to extract the complete location information for a single person from an “anonymized” data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person’s whereabouts.

The first author on the paper is Yves-Alexandre de Montjoye, a graduate student in the research group of Toshiba Professor of Media Arts and Science Sandy Pentland. He’s joined by César Hidalgo, an assistant professor of media arts and science; Vincent Blondel, a visiting professor at MIT and a professor of applied mathematics at Université Catholique; and Michel Verleysen, a professor of electrical engineering at Université Catholique.

Focusing the debate

Hidalgo’s group specializes in applying the tools of statistical physics to a wide range of subjects, from communications networks to genetics to economics. In this case, he and de Montjoye were able to use those tools to uncover a simple mathematical relationship between the resolution of spatiotemporal data and the likelihood of identifying a member of a data set.

According to their formula, the probability of identifying someone goes down if the resolution of the measurements decreases, but less than you might think. Reporting the time of each measurement as imprecisely as sometime within a 15-hour span, or location as imprecisely as somewhere amid 15 adjacent cell towers, would still enable the unique identification of half the people in the sample data set.

But while its initial application may be discouraging, de Montjoye and Hidalgo hope that their formula will provide a way for researchers and policy analysts to reason more rigorously about the privacy safeguards that need to be put in place when they’re working with aggregated location data.

“Both César and I deeply believe that we all have a lot to gain from this data being used,” de Montjoye says. “This formula is something that could be useful to help the debate and decide, OK, how do we balance things out, and how do we make it a fair deal for everyone to use this data?”

Everybody’s different

In the data set that the researchers analyzed, the location of a cellphone was inferred solely from that of the cell tower it was connected to, and the time of the connection was given as falling within a one-hour interval. Each cellphone had a unique, randomly generated identifying number, so that its movement could be traced over time. But there was no information connecting that number to the phone’s owner.

The researchers randomly selected a representative sampling from the set of 1.5 million cellphone traces and, for each trace, began choosing points at random. For 95 percent of the traces, just four randomly selected points was enough to distinguish them from all other traces in the database. In the worst (or, from another perspective, best) case, 11 measurements were necessary.

“There’s a concern with this data, to what extent can we preserve anonymity,” says Luis Bettencourt, a professor at the Santa Fe Institute who studies social systems. “What they are showing here, quite clearly, is that it’s very hard to preserve anonymity.”

But for Bettencourt, the uniqueness of people’s trajectories through cities is itself precisely the type of information that analysis of cellphone data is meant to uncover. “This is interesting, from a scientific point of view, to understand how people use urban space,” Bettencourt says. “It shows what kind of social systems cities are.”

The researchers suspect that similar relationships might hold for other types of data. “I would not be surprised if a similar result — maybe requiring more points — would, for example, extend to web browsing,” Hidalgo says. “The space of potential combinations is really large. When a person is, in some sense, being expressed in a space in which the total number of combinations is huge, the probability that two people would have the same exact trajectory — whether it’s walking or browsing — is almost nil.”

Sarah McDonnell | EurekAlert!
Further information:
http://www.mit.edu

More articles from Information Technology:

nachricht Intelligent maps will help robots navigate in your home
19.06.2018 | Schwedischer Forschungsrat - The Swedish Research Council

nachricht Football through the eyes of a computer
14.06.2018 | Universität Konstanz

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Overdosing on Calcium

Nano crystals impact stem cell fate during bone formation

Scientists from the University of Freiburg and the University of Basel identified a master regulator for bone regeneration. Prasad Shastri, Professor of...

Im Focus: AchemAsia 2019 will take place in Shanghai

Moving into its fourth decade, AchemAsia is setting out for new horizons: The International Expo and Innovation Forum for Sustainable Chemical Production will take place from 21-23 May 2019 in Shanghai, China. With an updated event profile, the eleventh edition focusses on topics that are especially relevant for the Chinese process industry, putting a strong emphasis on sustainability and innovation.

Founded in 1989 as a spin-off of ACHEMA to cater to the needs of China’s then developing industry, AchemAsia has since grown into a platform where the latest...

Im Focus: First real-time test of Li-Fi utilization for the industrial Internet of Things

The BMBF-funded OWICELLS project was successfully completed with a final presentation at the BMW plant in Munich. The presentation demonstrated a Li-Fi communication with a mobile robot, while the robot carried out usual production processes (welding, moving and testing parts) in a 5x5m² production cell. The robust, optical wireless transmission is based on spatial diversity; in other words, data is sent and received simultaneously by several LEDs and several photodiodes. The system can transmit data at more than 100 Mbit/s and five milliseconds latency.

Modern production technologies in the automobile industry must become more flexible in order to fulfil individual customer requirements.

Im Focus: Sharp images with flexible fibers

An international team of scientists has discovered a new way to transfer image information through multimodal fibers with almost no distortion - even if the fiber is bent. The results of the study, to which scientist from the Leibniz-Institute of Photonic Technology Jena (Leibniz IPHT) contributed, were published on 6thJune in the highly-cited journal Physical Review Letters.

Endoscopes allow doctors to see into a patient’s body like through a keyhole. Typically, the images are transmitted via a bundle of several hundreds of optical...

Im Focus: Photoexcited graphene puzzle solved

A boost for graphene-based light detectors

Light detection and control lies at the heart of many modern device applications, such as smartphone cameras. Using graphene as a light-sensitive material for...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

Munich conference on asteroid detection, tracking and defense

13.06.2018 | Event News

2nd International Baltic Earth Conference in Denmark: “The Baltic Sea region in Transition”

08.06.2018 | Event News

ISEKI_Food 2018: Conference with Holistic View of Food Production

05.06.2018 | Event News

 
Latest News

Carbon nanotube optics provide optical-based quantum cryptography and quantum computing

19.06.2018 | Physics and Astronomy

How to track and trace a protein: Nanosensors monitor intracellular deliveries

19.06.2018 | Life Sciences

New material for splitting water

19.06.2018 | Physics and Astronomy

VideoLinks
Science & Research
Overview of more VideoLinks >>>