Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan; De-Anonymizing Web Browsing Data with Social Networks; draft; In Some Venue Surely (they will publish this somewhere, it is so very nicely formatted); 2017-05; 9 pages.
Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show—theoretically, via simulation, and through experiments on real user data—that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user’s social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time. To gauge the real-world e↵ectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is—to our knowledge—the largest-scale demonstrated de-anonymization to date.
- <quote>Network adversaries—including government surveillance agencies, Internet service providers, and co↵ee shop eavesdroppers—also see URLs of unencrypted web traffic. The adversary may also be a cross-device tracking company aiming to link two di↵erent browsing histories (e.g., histories generated by the same user on di↵erent devices). For such an adversary, linking to social media profiles is a stepping stone.</quote>
|374 people||confirmed the accuracy of our deanonymization attempt.|
|268 people (72%)||were the top candidate generated by the MLE when using t.co links.|
|303 people (81%)||were among the top 15 candidates generated by the MLE when using t.co links.|
|Yet only 49% de-anonymization when using fully expanded links (the redirect target of the t.co link)|
<paraphrasing>We recruited participants by advertising the experiment on a variety of websites, including
- Hacker News,
- Freedom to Tinker
- people submitted web browsing histories.
- 119 cases (18%)
- the application encountered a fatal error (e.g., because the Twitter API was temporarily unavailable), and it was unable to run the de-anonymization algorithm.
- 530 cases
- remaining are useful.
- 87 users (16%)
- had fewer than four informative links, and so no attempt to de-anonymize them was made.
- 443 users
- remaining are useful.
- 374 users (84%)
- confirmed whether or not our de-anonymization attempt was successful.
- 77 users (21%),/dt>
- additionally disclosed their identity by signing into Twitter.
Apology: noted that the users who participated in our experiment are not representative of the Twitter population. In particular, they are quite active: the users who reported their identity had a median number of 378 followers and posted a median number of 2,041 total tweets.
TargetConsumer is a Registered Twitter User,
with activity and warm content selection algo in operation at Twitter HQ
- Twitter algo selects snippets for presentation to
TargetConsumer. TargetConsumer either elects to read or discards the linked page.
- An URL trail is recorded by The Panopticon Surveillance Machinery in The Record
- Adversary has access to The Record across long spans of time and large numbers of
- Can one or many
TargetConsumers be distinguished solely by URL traces in The Record?
See C. Y. Ma, D. K. Yau, N. K. Yip, N. S. Rao. “Privacy vulnerability of published anonymous mobility traces,” In IEEE/ACM Transactions on Networking, 21(3):720–733, 2013.
- The simple model of web browsing behavior in which a user’s likelihood of visiting a URL is governed by the URL’s overall popularity and whether the URL appeared in the
TargetConsumer’s Twitter feed.
- For each
TargetConsumer, we compute their likelihood (under the model) of generating a given anonymous browsing history.
- Identify the
TargetConsumer most likely to have generated that history.
- Cookie Syncing
- HTML5 localStorage
- HTTP (HTTP)
- Jaccard Similarity
- Maximum Liklihood Estimate (MLE)
- URL (URL)
- Ad Networks Can Personally Identify Web Users; Wendy Davis; In MediaPost; 2017-01-20.
<quote> The authors tested their theory by recruiting 400 people who allowed their Web browsing histories to be tracked, and then comparing the sites they visited to sites mentioned in Twitter accounts they followed. The researchers say they were able to use that method to identify more than 70% of the volunteers.</quote>
- G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of ACM Conference on Computer Communications & Security (CCS), pages 674–689. ACM, 2014.
- G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, B. Preneel. Fpdetective: dusting the web for fingerprinters. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pages 1129–1140. ACM, 2013.
- M. D. Ayenson, D. J. Wambach, A. Soltani, N. Good, C. J. Hoofnagle. Flash cookies and privacy II: Now with HTML5 and ETag respawning. 2011.
- C. Budak, S. Goel, J. Rao, G. Zervas. Understanding emerging threats to online advertising. In Proceedings of the ACM Conference on Economics and Computation, 2016.
- M. Chew, S. Stamm. Contextual identity: Freedom to be all your selves. In Proceedings of the Workshop on Web,/em>, volume 2. Citeseer, 2013.
- ] N. Christin, S. S. Yanagihara, K. Kamataki. Dissecting one click frauds. In Proceedings of the 17th ACM conference on Computer and Communications Security
- Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. In Scientific Reports, 3, 2013.
- Y.-A. De Montjoye, L. Radaelli, V. K. Singh, et al. Unique in the shopping mall: On the reidentifiability of credit card metadata. In Science, 347(6221), 2015.
- P. Eckersley. How unique is your web browser? In, pages 1–18. Springer, 2010.
- S. Englehardt, A. Narayanan. Online tracking: A 1-million-site measurement and analysis. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2016.
- S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman, J. Mayer, A. Narayanan, E. W. Felten. Cookies that give you away: The surveillance implications of web tracking. In Proceedings of the 24th Conference on World Wide Web (WWW), 2015.
- Ú. Erlingsson, V. Pihur, A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the Conference on Computer and Communications Security (CCS), 2014.
- D. Fifield, S. Egelman. Fingerprinting web users through font metrics. In Proceedings of the International Conference on Financial Cryptography and Data Security, 2015.
- S. Hill, F. Provost. The myth of the double-blind review?: Author identification using only citations. In SIGKDD Explor(ification) Newsletter, 5(2):179–184, Dec. 2003.
- M. Korayem, D. J. Crandall. De-anonymizing users across heterogeneous social computing platforms. In Proceedings of the Internation Conference on W(something) S(something) M(something) as “Some Acronym” (ICWSM), 2013.
- A. Korolova, K. Kenthapadi, N. Mishra, A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th International Conference on World Wide Web (WWW). ACM, 2009.
- B. Krishnamurthy, K. Naryshkin, C. Wills. Privacy leakage vs. protection measures: the growing disconnect. In Proceedings of the Web
- B. Krishnamurthy, C. E. Wills. On the leakage of personally identifiable information via online social networks. In Proceedings of the 2nd ACM Workshop on Online Social Networks (WOSN), pages 7–12. ACM, 2009.
- P. Laperdrix, W. Rudametkin, B. Baudry. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2016.
- A. Lerner, A. K. Simpson, T. Kohno, F. Roesner. Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In Proceedings of the 25th USENIX Security Symposium, 2016.
- T. Libert. Exposing the invisible web: An analysis of third-party http requests on 1 million websites. In International Journal of Communication, 9:18, 2015.
- C. Y. Ma, D. K. Yau, N. K. Yip, N. S. Rao. Privacy vulnerability of published anonymous mobility traces. In IEEE/ACM Transactions on Networking, 21(3):720–733, 2013.
- A. Marthews, C. Tucker. Government surveillance and internet search behavior. Available at ssrn:2412564, 2015.
- N. Mathewson, R. Dingledine. Practical traffic analysis: Extending and resisting statistical disclosure. In Proceedings of the International Workshop on Privacy Enhancing Technologies (PETS), pages 17–34. Springer, 2004.
- J. R. Mayer, J. C. Mitchell. Third-party web tracking: Policy and technology. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 2012.
- K. Mowery, H. Shacham. Pixel perfect: Fingerprinting canvas in HTML5. In Proceedings of the Conference with the Acronym “W2SP” (W2SP), 2012.
- A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, D. Song. On the feasibility of internet-scale author identification. In Proceedings of the IEEE Symposium on Security and Privacy, 2012.
- A. Narayanan, V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP), pages 111–125. IEEE, 2008.
- N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, G. Vigna. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In Proceedings of the 2013 IEEE symposium on Security and Privacy (SP), pages 541–555. IEEE, 2013.
- L. Olejnik, G. Acar, C. Castelluccia, C. Diaz. The leaking battery A privacy analysis of the HTML5 Battery Status API. Technical Report, WHERE? 2015.
- L. Olejnik, C. Castelluccia, A. Janc. Why Johnny can’t browse in peace: On the uniqueness of web browsing history patterns. In Proceedings of the 5th Workshop on Hot Topics in Privacy Enhancing Technologies (PETS), 2012.
- J. Penney. Chilling effects: Online surveillance and wikipedia use. In Berkeley Technology Law Journal, 2016.
- A. Ramachandran, Y. Kim, A. Chaintreau. “I knew they clicked when I saw them with their friends”. In Proceedings of the 2nd Conference on Online Social Networks, 2014.
- F. Roesner, T. Kohno, D. Wetherall. Detecting and defending against third-party tracking on the web. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pages 12–12. USENIX Association, 2012.
- K. Sharad, G. Danezis. An automated social graph de-anonymization technique. In Proceedings of the 13th Workshop on Privacy in the Electronic Society (WPES), pages 47–58. ACM, 2014.
- A. Soltani, S. Canty, Q. Mayo, L. Thomas, C. J. Hoofnagle. Flash cookies and privacy. In Proceedings of the AAAI Spring Symposium: Intelligent Information Privacy Management, volume 2010, pages 158–163, 2010.
- J. Su, A. Sharma, S. Goel. The effect of recommendations on network structure. In Proceedings of the 25th Conference on World Wide Web (WWW), 2016.
- G. Wondracek, T. Holz, E. Kirda, C. Kruegel. A practical attack to de-anonymize social network users. In Proceedings of the IEEE Symposium on Security and Privacy, 2010.