We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.
Larry Hardesty (MIT News Office); How hard is it to ‘de-anonymize’ cellphone data?; press release; 2013-03-27.
Teaser: A new formula that characterizes the privacy afforded by large, aggregate data sets may be discouraging, but could help sharpen policy discussion.
cited a couple of analyses of census data, including one using 1990 census data and proving that 87 percent of Americans could be identified using just their ZIP code, sex and date of birth.
Naren Ramakrishnan, Benjamin J. Keller, Batul J. Mirza, Ananth Y. Grama, George Karypis; Privacy Risks in Recommender Systems ; In IEEE Computer; 2001-11/2001-12.
Teaser: Recommender system users who rate items across disjoint domains face a privacy risk analogous to the one that occurs with statistical database queries.