Exploring ADINT: Using Ad Targeting for Surveillance on a Budget — or — How Alice Can Buy Ads to Track Bob | Vines, Roesner, Kohno

Paul Vines, Franziska Roesner, Tadayoshi Kohno; Exploring ADINT: Using Ad Targeting for Surveillance on a Budget — or — How Alice Can Buy Ads to Track Bob; In Proceedings of the 16th ACM Workshop on Privacy in the Electronic Society (WPES 2017); 2017-10-30; 11 pages; outreach.

tl;dr → Tadayoshi et al. are virtuosos at these performance art happenings. Catchy hook, cool marketing name (ADINT) and press outreach frontrunning the actual conference venue. For the wuffie and the lulz. Nice demo tho.
and → They bought geofence campaigns in a grid. They used close-the-loop analytics to identify the sojourn trail of the target.
and → dont’ use Grindr.

Abstract

The online advertising ecosystem is built upon the ability of advertising networks to know properties about users (e.g., their interests or physical locations) and deliver targeted ads based on those properties. Much of the privacy debate around online advertising has focused on the harvesting of these properties by the advertising networks. In this work, we explore the following question: can third-parties use the purchasing of ads to extract private information about individuals? We find that the answer is yes. For example, in a case study with an archetypal advertising network, we find that — for $1000 USD — we can track the location of individuals who are using apps served by that advertising network, as well as infer whether they are using potentially sensitive applications (e.g., certain religious or sexuality-related apps). We also conduct a broad survey of other ad networks and assess their risks to similar attacks. We then step back and explore the implications of our findings.

Mentions

  • Markets
    They chose

    • Facebooik
    • not Google
    • etc.
    • not to fight with big DSPs;
      the picked the weaker ones to highlight.
  • Apps
    They chose

    • lower-quality apps.
    • adult apps
      few “family oriented” [none?] apps.
    • <ahem>Adult Diapering Diary</ahem>
      <ahem>Adult Diapering Diary</ahem>

Claimed

  • DSPs sell 8m CEP (precision) location.

Spooky Cool Military Lingo

  • SIGINT
  • HUMINT
  • ADINT

Targeting Dimensions

  • Demographics
  • Interests
  • Personally-Identifying Information (PII)
  • Domain (a usage taxonomy)
  • Location
  • Identifiers
    • Cookie Identifier
    • Mobile Ad Identifier (e.g. IDFA, GPSAID)
  • Technographics
    • Device (Make Model OS)
    • Network (Carrier)
  • Search

Media Types

Supply-Side Platforms (SSPs)

  • Adbund
  • InnerActive
  • MobFox
  • Smaato
  • Xapas

Supply (the adware itself, The Applications, The Apps)

  • Adult Diapering Diary
  • BitTorrent
  • FrostWire
  • Grindr
  • Hide My Texts
  • Hide Pictures vault
  • Hornet
  • iFunny
  • Imgur
  • Jack’D
  • Meet24
  • MeetMe
  • Moco
  • My Mixtapez Music
  • Pregnant Mommy’s Maternity
  • Psiphon
  • Quran Reciters
  • Romeo
  • Tagged
  • Talkatone
  • TextFree
  • TextMe
  • TextPlus
  • The Chive
  • uTorrent
  • Wapa
  • Words with Friends

Demand-Side Platforms (DSPs)

  • Ademedo
  • AddRoll
  • AdWords
  • Bing
  • Bonadza
  • BluAgile
  • Centro
  • Choozle
  • Criteo
  • ExactDrive
  • Facebook
  • GetIntent
  • Go2Mobi
  • LiquidM
  • MediaMath
  • MightyHive
  • Simpli.Fi
  • SiteScout
  • Splicky
  • Tapad

Promotions

References

  • Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz. 2014. The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. In Proceedings of the ACM Conference on Computer and Communications Security.
  • Rebecca Balebako, Pedro Leon, Richard Shay, Blase Ur, Yang Wang, L Cranor. 2012. Measuring the effectiveness of privacy tools for limiting behavioral advertising. In Web 2.0 Security and Privacy.
  • Hal Berghel. 2001. Caustic Cookies. In His Blog.
  • Interactive Advertising Bureau. 2015. IAB Tech Lab Content Taxonomy.
  • Interactive Advertising Bureau. 2017. IAB Interactive Advertising Wiki.
  • Giuseppe Cattaneo, Giancarlo De Maio, Pompeo Faruolo, Umberto Ferraro Petrillo. 2013. A review of security attacks on the GSM standard. In Information and Communication Technology-EurAsia Conference. Springer, pages 507–512.
  • Robert M Clark. 2013. Perspectives on Intelligence Collection. In The intelligencer, a Journal of US Intelligence Studies 20, 2, pages 47–53.
  • David Cole. 2014. We kill people based on metadata. In The New York Review of Books
  • Jonathan Crussell, Ryan Stevens, Hao Chen. 2014. Madfraud: Investigating ad fraud in android applications. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services. ACM, pages 123–134.
  • Doug DePerry, Tom Ritter, Andrew Rahimi. 2013. Cloning with a Compromised CDMA Femtocell.
  • Google Developers. 2017. Google Ads.
  • Steven Englehardt and Arvind Narayanan. 2016. Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, pages 1388–1401.
  • Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, Edward W Felten. 2015. Cookies that give you away: The surveillance implications of web tracking. In Proceedings of the 24th International Conference on World Wide Web. ACM, pages 289–299.
  • Go2mobi. 2017.
  • Aleksandra Korolova. 2010. Privacy violations using microtargeted ads: A case study. In Proceedings of the 2010 IEEE International Conference on IEEE Data Mining Workshops (ICDMW), pages 474–482.
  • Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, XiaoFeng Wang. 2012. Knowing your enemy: understanding and detecting malicious web advertising. In Proceedings of the 2012 ACM conference on Computer and Communications Security. ACM, pages 674–686.
  • Nicolas Lidzborski. 2014. Staying at the forefront of email security and reliability: HTTPS-only and 99.978 percent availability.; In Their Blog. Google.
  • Steve Mansfield-Devine. 2015. When advertising turns nasty. In Network Security 11, pages 5–8.
  • Jeffrey Meisner. 2014. Advancing our encryption and transparency efforts. In Their Blog, Microsoft.
  • Rick Noack. 2014. Could using gay dating app Grindr get you arrested in Egypt?. In The Washington Post.
  • Franziska Roesner, Tadayoshi Kohno, David Wetherall. 2012. Detecting and Defending Against Third-Party Tracking on the Web. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI).
  • Sooel Son, Daehyeok Kim, Vitaly Shmatikov. 2016. What mobile ads know about mobile users. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS).
  • Mark Joseph Stern. 2016. This Daily Beast Grindr Stunt Is Sleazy, Dangerous, and Wildly Unethical. In Slate, 2016.
  • Ryan Stevens, Clint Gibler, Jon Crussell, Jeremy Erickson, Hao Chen. 2012. Investigating user privacy in android ad libraries. In Proceedings of the Workshop on Mobile Security Technologies<e/m> (MoST).
  • Ratko Vidakovic. 2013. The Mechanics Of Real-Time Bidding. In Marketingland.
  • Craig E. Wills and Can Tatar. 2012. Understanding what they do with what they know. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES).
  • Tom Yeh, Tsung-Hsiang Chang, Robert C Miller. 2009. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM Symposium on User Interface Software and Technology. ACM, pages 183–192.
  • Apostolis Zarras, Alexandros Kapravelos, Gianluca Stringhini, Thorsten Holz, Christopher Kruegel, Giovanni Vigna. 2014. The dark alleys of madison avenue: Understanding malicious advertisements. In Proceedings of the 2014 Conference on Internet Measurement Conference
  • Tiliang Zhang, Hua Zhang, Fei Gao. 2013. A Malicious Advertising Detection Scheme Based on the Depth of URL Strategy. In Proceedings of the 2013 Sixth International Symposium on Computational Intelligence and Design (ISCID), Vol. 2. IEEE, pages 57–60.
  • Peter Thomas Zimmerman. 2015. Measuring privacy, security, and censorship through the utilization of online advertising exchanges. Technical Report. Tech. rep., Princeton University.

Argot

The Suitcase Words

  • Mobile Advertising ID (MAID)
  • Demand-Side Platform (DSP)
  • Supply-Side Platform (SSP)
  • Global Positioning System (GPS)
  • Google Play Store (GPS)
  • geofencing
  • cookie tracking
  • Google Advertising Identifier (GAID)
    Google Play Services Advertising Identifier (GAID)
  • Facebook
  • Snowden
  • WiFi

Previously filled.

Incompatible: The GDPR in the Age of Big Data | Tal Zarsky

Tal Zarsky (Haifa); Incompatible: The GDPR in the Age of Big Data; Seton Hall Law Review, Vol. 47, No. 4(2), 2017; 2017-08-22; 26 pages; ssrn:3022646.
Tal Z. Zarsky is Vice Dean and Professor, Haifa University, IL.

tl;dr → the opposition is elucidated and juxtaposed; the domain is problematized.
and → “Big Data,” by definition, is opportunistic and unsupervisable; it collects everything and identifies something later in the backend.  Else it is not “Big Data” (it is “little data,” which is known, familiar, boring, and of course has settled law surrounding its operational envelope).

Abstract

After years of drafting and negotiations, the EU finally passed the General Data Protection Regulation (GDPR). The GDPR’s impact will, most likely, be profound. Among the challenges data protection law faces in the digital age, the emergence of Big Data is perhaps the greatest. Indeed, Big Data analysis carries both hope and potential harm to the individuals whose data is analyzed, as well as other individuals indirectly affected by such analyses. These novel developments call for both conceptual and practical changes in the current legal setting.

Unfortunately, the GDPR fails to properly address the surge in Big Data practices. The GDPR’s provisions are — to borrow a key term used throughout EU data protection regulation — incompatible with the data environment that the availability of Big Data generates. Such incompatibility is destined to render many of the GDPR’s provisions quickly irrelevant. Alternatively, the GDPR’s enactment could substantially alter the way Big Data analysis is conducted, transferring it to one that is suboptimal and inefficient. It will do so while stalling innovation in Europe and limiting utility to European citizens, while not necessarily providing such citizens with greater privacy protection.

After a brief introduction (Part I), Part II quickly defines Big Data and its relevance to EU data protection law. Part III addresses four central concepts of EU data protection law as manifested in the GDPR: Purpose Specification, Data Minimization, Automated Decisions and Special Categories. It thereafter proceeds to demonstrate that the treatment of every one of these concepts in the GDPR is lacking and in fact incompatible with the prospects of Big Data analysis. Part IV concludes by discussing the aggregated effect of such incompatibilities on regulated entities, the EU, and society in general.

Rebuttal

<snide><irresponsible>Apparently this was not known before the activists captured the legislature and affected their ends with the force of law. Now we know. Yet we all must obey the law, as it stands and as it is written. And why was this not published in an EU-located law journal, perhaps one located in … Brussels?</irresponsible></snide>

Contents

  1. INTRODUCTION AND ROAD MAP
  2. A BRIEF PRIMER ON BIG DATA AND THE LAW
  3. THE GDPR’S INCOMPATIBILITY
    FOUR EXAMPLES

    1. Purpose Limitation
    2. Data Minimization
    3. Special Categories
    4. Automated Decisions
  4. CONCLUSION: WHAT’S NEXT FOR EUROPE?

Mentioned

  • Big Data (contra “little data”)
  • personal data
  • Big Data Revolution
  • evolution not revolution
    no really, revolution not evolution
  • The GDPR is a regulation “on the protection of natural persons,”
  • EU General Data Protection Regulation (GDPR)
  • EU Data Protection Directive (DPD)
  • IS GDPR different than DPD?  Maybe not.  Why? c.f. page 10.
  • Various attempts at intuiting bright-line tests around the laws are recited.
    It is a law, but nobody knows how it is interpreted or how it might be enforced.
  • statistical purpose
  • analytical purpose
  • data minimization
  • pseudonymization
  • reidentification
  • specific individuals
  • <quote>n the DPD, article 8(1) prohibited the processing of data “revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and the processing of data concerning health or sex life,” while providing narrow exceptions.85 This distinction was embraced by the GDPR.</quote>
  • Article 29 Working Party
  • on (special) category contagion
    “we feel that all data is credit data, we just don’t know how to use it yet.”
    c.f. page 19; attributed to Dr. Douglas Merrill, then-founder, ZestFinance, ex-CTO, Google.
  • data subjects
  • automated decisions
  • right to “contest the decision”
  • obtain human intervention
  • trade secrets contra decision transparency
    by precedent, in EU (DE), corporate rights trump decision subject’s rights.
  • [a decision process] must be interpretable
  • right to due process [when facing a machine]

Definitions

Big Data is…

  • …wait for it… so very very big
    …thank you, thank you very much. I will be here all week. Please tip your waitron.
  • The Four Five “Vs”
The Four Five “Vs”
  1. The Volume of data collected,
  2. The Variety of the sources,
  3. The Velocity,
    <quote>with which the analysis of the data can unfold,</quote>,
  4. The Veracity,
    <quote>of the data which could (arguably) be achieved through the analytical process.</quote>,
  5. The Value, yup, that’s five.
    … <quote>yet this factor seems rather speculative and is thus best omitted.</quote>,
Erudition

The Brussels Effect

  • What goes on in EU goes global,
  • “Europeanization”
  • Law in EU is applied world-wide because corporate operations are universal.
Erudition

Aspects

  • purpose limitation,
  • data minimization,
  • special categories,
  • automated decisions.

References

There are 123 references, across 26 pages of prose, made manifest as footnotes in the legal style. Here, simplified and deduplicated.

Previously filled.

Syllabus for Solon Barocas @ Cornell | INFO 4270: Ethics and Policy in Data Science

INFO 4270 – Ethics and Policy in Data Science
Instructor: Solon Barocas
Venue: Cornell University

Syllabus

Solon Barocas

Readings

A Canon, The Canon

In order of appearance in the syllabus, without the course cadence markers…

  • Danah Boyd and Kate Crawford, Critical Questions for Big Data; In <paywalled>Information, Communication & Society,Volume 15, Issue 5 (A decade in Internet time: the dynamics of the Internet and society); 2012; DOI:10.1080/1369118X.2012.678878</paywalled>
    Subtitle: Provocations for a cultural, technological, and scholarly phenomenon
  • Tal Zarsky, The Trouble with Algorithmic Decisions; In Science, Technology & Human Values, Vol 41, Issue 1, 2016 (2015-10-14); ResearchGate.
    Subtitle: An Analytic Road Map to Examine Efficiency and Fairness in Automated and Opaque Decision Making
  • Cathy O’Neil, Weapons of Math Destruction; Broadway Books; 2016-09-06; 290 pages, ASIN:B019B6VCLO: Kindle: $12, paper: 10+SHT.
  • Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information; Harvard University Press; 2016-08-29; 320 pages; ASIN:0674970845: Kindle: $10, paper: $13+SHT.
  • Executive Office of the President, President Barack Obama, Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights; The White House Office of Science and Technology Policy (OSTP); 2016-05; 29 pages; archives.
  • Lisa Gitelman (editor), “Raw Data” is an Oxymoron; Series: Infrastructures; The MIT Press; 2013-01-25; 192 pages; ASIN:B00HCW7H0A: Kindle: $20, paper: $18+SHT.
    Lisa Gitelman, Virginia Jackson; Introduction (6 pages)
  • Agre, “Surveillance and Capture: Two Models of Privacy”
  • Bowker and Star, Sorting Things Out
  • Auerbach, “The Stupidity of Computers”
  • Moor, “What is Computer Ethics?”
  • Hand, “Deconstructing Statistical Questions”
  • O’Neil, On Being a Data Skeptic
  • Domingos, “A Few Useful Things to Know About Machine Learning”
  • Luca, Kleinberg, and Mullainathan, “Algorithms Need Managers, Too”
  • Friedman and Nissenbaum, “Bias in Computer Systems”
  • Lerman, “Big Data and Its Exclusions”
  • Hand, “Classifier Technology and the Illusion of Progress” [Sections 3 and 4]
  • Pager and Shepherd, “The Sociology of Discrimination: Racial Discrimination in Employment, Housing, Credit, and Consumer Markets”
  • Goodman, “Economic Models of (Algorithmic) Discrimination”
  • Hardt, “How Big Data Is Unfair”
  • Barocas and Selbst, “Big Data’s Disparate Impact” [Parts I and II]
  • Gandy, “It’s Discrimination, Stupid”
  • Dwork and Mulligan, “It’s Not Privacy, and It’s Not Fair”
  • Sandvig, Hamilton, Karahalios, and Langbort, “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms”
  • Diakopoulos, “Algorithmic Accountability: Journalistic Investigation of Computational Power Structures”
  • Lavergne and Mullainathan, “Are Emily and Greg more Employable than Lakisha and Jamal?”
  • Sweeney, “Discrimination in Online Ad Delivery”
  • Datta, Tschantz, and Datta, “Automated Experiments on Ad Privacy Settings”
  • Dwork, Hardt, Pitassi, Reingold, and Zemel, “Fairness Through Awareness”
  • Feldman, Friedler, Moeller, Scheidegger, and Venkatasubramanian, “Certifying and Removing Disparate Impact”
  • Žliobaitė and Custers, “Using Sensitive Personal Data May Be Necessary for Avoiding Discrimination in Data-Driven Decision Models”
  • Angwin, Larson, Mattu, and Kirchner, “Machine Bias”
  • Kleinberg, Mullainathan, and Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores”
  • Northpointe, COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity
  • Chouldechova, “Fair Prediction with Disparate Impact”
  • Berk, Heidari, Jabbari, Kearns, and Roth, “Fairness in Criminal Justice Risk Assessments: The State of the Art”
  • Hardt, Price, and Srebro, “Equality of Opportunity in Supervised Learning”
  • Wattenberg, Viégas, and Hardt, “Attacking Discrimination with Smarter Machine Learning”
  • Friedler, Scheidegger, and Venkatasubramanian, “On the (Im)possibility of Fairness”
  • Tene and Polonetsky, “Taming the Golem: Challenges of Ethical Algorithmic Decision Making”
  • Lum and Isaac, “To Predict and Serve?”
  • Joseph, Kearns, Morgenstern, and Roth, “Fairness in Learning: Classic and Contextual Bandits”
  • Barocas, “Data Mining and the Discourse on Discrimination”
  • Grgić-Hlača, Zafar, Gummadi, and Weller, “The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making”
  • Vedder, “KDD: The Challenge to Individualism”
  • Lippert-Rasmussen, “‘We Are All Different’: Statistical Discrimination and the Right to Be Treated as an Individual”
  • Schauer, Profiles, Probabilities, And Stereotypes
  • Caliskan, Bryson, and Narayanan, “Semantics Derived Automatically from Language Corpora Contain Human-like Biases”
  • Zhao, Wang, Yatskar, Ordonez, and Chang, “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints”
  • Bolukbasi, Chang, Zou, Saligrama, and Kalai, “Man Is to Computer Programmer as Woman Is to Homemaker?”
  • Citron and Pasquale, “The Scored Society: Due Process for Automated Predictions”
  • Ananny and Crawford, “Seeing without Knowing”
  • de Vries, “Privacy, Due Process and the Computational Turn”
  • Zarsky, “Transparent Predictions”
  • Crawford and Schultz, “Big Data and Due Process”
  • Kroll, Huey, Barocas, Felten, Reidenberg, Robinson, and Yu, “Accountable Algorithms”
  • Bornstein, “Is Artificial Intelligence Permanently Inscrutable?”
  • Burrell, “How the Machine ‘Thinks’”
  • Lipton, “The Mythos of Model Interpretability”
  • Doshi-Velez and Kim, “Towards a Rigorous Science of Interpretable Machine Learning”
  • Hall, Phan, and Ambati, “Ideas on Interpreting Machine Learning”
  • Grimmelmann and Westreich, “Incomprehensible Discrimination”
  • Selbst and Barocas, “Regulating Inscrutable Systems”
  • Jones, “The Right to a Human in the Loop”
  • Edwards and Veale, “Slave to the Algorithm? Why a ‘Right to Explanation’ is Probably Not the Remedy You are Looking for”
  • Duhigg, “How Companies Learn Your Secrets”
  • Kosinski, Stillwell, and Graepel, “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior”
  • Barocas and Nissenbaum, “Big Data’s End Run around Procedural Privacy Protections”
  • Chen, Fraiberger, Moakler, and Provost, “Enhancing Transparency and Control when Drawing Data-Driven Inferences about Individuals”
  • Robinson and Yu, Knowing the Score
  • Hurley and Adebayo, “Credit Scoring in the Era of Big Data”
  • Valentino-Devries, Singer-Vine, and Soltani, “Websites Vary Prices, Deals Based on Users’ Information”
  • The Council of Economic Advisers, Big Data and Differential Pricing
  • Hannak, Soeller, Lazer, Mislove, and Wilson, “Measuring Price Discrimination and Steering on E-commerce Web Sites”
  • Kochelek, “Data Mining and Antitrust”
  • Helveston, “Consumer Protection in the Age of Big Data”
  • Kolata, “New Gene Tests Pose a Threat to Insurers”
  • Swedloff, “Risk Classification’s Big Data (R)evolution”
  • Cooper, “Separation, Pooling, and Big Data”
  • Simon, “The Ideological Effects of Actuarial Practices”
  • Tufekci, “Engineering the Public”
  • Calo, “Digital Market Manipulation”
  • Kaptein and Eckles, “Selecting Effective Means to Any End”
  • Pariser, “Beware Online ‘Filter Bubbles’”
  • Gillespie, “The Relevance of Algorithms”
  • Buolamwini, “Algorithms Aren’t Racist. Your Skin Is just too Dark”
  • Hassein, “Against Black Inclusion in Facial Recognition”
  • Agüera y Arcas, Mitchell, and Todorov, “Physiognomy’s New Clothes”
  • Garvie, Bedoya, and Frankle, The Perpetual Line-Up
  • Wu and Zhang, “Automated Inference on Criminality using Face Images”
  • Haggerty, “Methodology as a Knife Fight”
    <snide>A metaphorical usage. Let hyperbole be your guide</snide>

Previously filled.

Code Dependent: Pros and Cons of the Algorithm Age | Pew Research

, ; Code Dependent: Pros and Cons of the Algorithm Age; 2017-02-08; 87 pages; landing.
Teaser: Algorithms are aimed at optimizing everything. They can save lives, make things easier and conquer chaos. Still, experts worry they can also put too much control in the hands of corporations and governments, perpetuate bias, create filter bubbles, cut choices, creativity and serendipity, and could result in greater unemployment.

tl;dr → there be dragons; this is an important area; the future is at stake; the alarum has been sounded; there are seers who can show us the way. In their own words.

Series

Future of the Internet, of Pew Research & Elon University.

Table of Contents

  • Overview
  • Themes illuminating concerns and challenges
  • Key experts’ thinking about the future impacts of algorithms
  • About this canvassing of experts
  • Theme 1: Algorithms will continue to spread everywhere
  • Theme 2: Good things lie ahead
  • Theme 3: Humanity and human judgment are lost when data and predictive modeling become paramount
  • Theme 4: Biases exist in algorithmically-organized systems
  • Theme 5: Algorithmic categorizations deepen divides
  • Theme 6: Unemployment will rise
  • Theme 7: The need grows for algorithmic literacy, transparency and oversight
  • Acknowledgments

Promotion

Code-Dependent: Pros and Cons of the Algorithm Age; , (Pew Research Center); In Their Blog; 2017-02-08.

Teaser: Algorithms are aimed at optimizing everything. They can save lives, make things easier and conquer chaos. Still, experts worry they can also put too much control in the hands of corporations and governments, perpetuate bias, create filter bubbles, cut choices, creativity and serendipity, and could result in greater unemployment/

Mentions

  • Pew Research Center of the Pew Charitable Trusts
  • Imagining the Internet Center at Elon Univesity
  • <ahem>the Singularity enthusiasts … .</ahem>

Themes

  1. Algorithms will continue to spread everywhere
  2. Good things lie ahead
  3. Humanity adn human judgement are lost wwhen data nad predictive modeling become paramount
  4. Biases exist in algorithymically-organized systems
  5. algorithmic categorizations deepen divides
  6. Unemployment will rise
  7. The need grows for algorithmic literacy, transparency and oversight.

Argot

  • <snicker>Artificial Intelligence (AI)</snicker>
  • algocratic governance
  • surveillance capitalism
  • information capitalism
  • topsight
  • black-box nature [of]
  • digital scientism
  • obedience score

Quoted

  • Aneesh Aneesh, Stanford University.
  • Peter Diamandis, CEO, XPrize Foundation.
  • Shoshana Zuboff, Harvard.
  • Jim Warren, activist.
  • Terry Langendoen, expert, U.S. National Science Foundation.
  • Patrick Tucker technology editor at Defense One,.
  • Paul Jones, clinical professor at the University of North Carolina-Chapel Hill and director of ibiblio.org.
  • David Krieger, director of the Institute for Communication & Leadership IKF,.
  • Galen Hunt, partner research manager at Microsoft Research NExT,.
  • Alf Rehn, professor and chair of management and organization at Åbo Akademi University in Finland,.
  • Andrew Nachison, founder at We Media,.
  • Luis Lach, president of the Sociedad Mexicana de Computación en la Educación, A.C.
  • Frank Pasquale, professor of law, University of Maryland.
  • Jeff Jarvis, reporter.
  • Cindy Cohn, executive director at the Electronic Frontier Foundation,.
  • Bernardo A. Huberman, senior fellow and director of the Mechanisms and Design Lab at HPE Labs, Hewlett Packard Enterprise.
  • Marcel bullinga, expert.
  • Michael Rogers, principal, Practical Futurist.
  • Brian Christian, Tom Griffiths.
  • David Gelertner.
  • Deloitte Global (anonymous contributors).
  • Barry Chudakov, founder and principal at Sertain Research and StreamFuzion Corp.
  • Stephen Downes, staff, National Research Council of Canada,.
  • Bart Knijnenburg, assistant professor in human-centered computing at Clemson University.
  • Justin Reich, executive director at the MIT Teaching Systems Lab.
  • Dudley Irish, tradesman (a coder).
  • Ryan Hayes, owner of Fit to Tweet,.
  • Adam Gismondi, a visiting scholar at Boston College.
  • Susan Etlinger, staff, Altimeter Group.
  • Chris Kutarna, fellow, Oxford Martin School.
  • Vintno Cert, Internet Hall of Fame, vice president and chief internet evangelist at Google:.
  • Cory Doctorow, writer, computer science activist-in-residence at MIT Media Lab and co-owner of Boing Boing.
  • Jonathan Grudin, Microsoft.
  • Doc Searls, director, Project VRM, Berkman Center, Harvard University,.
  • Marc Rotenberg, executive director of the Electronic Privacy Information Center.
  • Richard Stallman, Internet Hall of Fame, president of the Free Software Foundation.
  • David Clark, Internet Hall of Fame, senior research scientist at MIT,.
  • Baratunde Thurston, Director’s Fellow at MIT Media Lab, ex-digital director of The Onion.
  • Anil Dash, pundit.
  • John Markoff, New York Times.
  • Danah Boyd (“danah boyd”), founder, Data & Society, an advocacy group.
  • Henning Schulzrinne, Internet Hall of Fame, professor at Columbia University,.
  • Amy Webb, futurist and CEO at the Future Today Institute.
  • Jamais Cascio, distinguished fellow at the Institute for the Future.
  • Mike Liebhold, senior researcher and distinguished fellow at the Institute for the Future,.
  • Ben Shneiderman, professor of computer science at the University of Maryland,.
  • David Weinberger, senior researcher at the Harvard Berkman Klein Center for Internet & Society.

Referenced

Previously filled.

Networks of Control | Cracked Labs

!


Wolfie Christl and Sarah Spiekermann; Networks of Control; Facultas, Vienna; 2016; 185 pages; landing.
Teaser: A Report on Corporate Surveillance, Digital Tracking, Big Data & Privacy

Table of Contents

  1. Preface
  2. Introduction
  3. Analyzing Personal Data
    1. Big Data and predicting behavior with statistics and data mining
    2. Predictive analytics based on personal data: selected examples
      1. The “Target” example: predicting pregnancy from purchase behavior
      2. Predicting sensitive personal attributes from Facebook Likes
      3. Judging personality from phone logs and Facebook data
      4. Analyzing anonymous website visitors and their web searches
      5. Recognizing emotions from keyboard typing patterns
      6. Forecasting future movements based on phone data
      7. Predicting romantic relations and job success from Facebook data
    3. De-anonymization and re-identification
  4. Analyzing Personal Data in Marketing, Finance, Insurance and Work
    1. Practical examples of predicting personality from digital records
    2. Credit scoring and personal finance
    3. Employee monitoring, hiring and workforce analytics
    4. Insurance and healthcare
    5. Fraud prevention and risk management
    6. Personalized price discrimination in e-commerce
  5. Recording Personal Data – Devices and Platforms
    1. Smartphones, mobile devices and apps – spies in your pocket?
    2. Car telematics, tracking-based insurance and the Connected Car
      1. Data abuse by apps
    3. Wearables, fitness trackers and health apps – measuring the self
      1. A step aside – gamification, surveillance and influence on behavior
      2. Example: Fitbit’s devices and apps
      3. Transmitting data to third parties
      4. Health data for insurances and corporate wellness
    4. Ubiquitous surveillance in an Internet of Things?
      1. Examples – from body and home to work and public space
  6. Data Brokers and the Business of Personal Data
    1. The marketing data economy and the value of personal data
    2. Thoughts on a ‘Customers’ Lifetime Risk’ – an excursus
    3. From marketing data to credit scoring and fraud detection
    4. Observing, inferring, modeling and scoring people
    5. Data brokers and online data management platforms
    6. Cross-device tracking and linking user profiles with hidden identifiers
    7. Case studies and example companies
      1. Acxiom – the world’s largest commercial database on consumers
      2. Oracle and their consumer data brokers Bluekai and Datalogix
      3. Experian – expanding from credit scoring to consumer data
      4. arvato Bertelsmann – credit scoring and consumer data in Germany
      5. LexisNexis and ID Analytics – scoring, identity, fraud and credit risks
      6. Palantir – data analytics for national security, banks and insurers
      7. Alliant Data and Analytics IQ – payment data and consumer scores
      8. Lotame – an online data management platform (DMP)
      9. Drawbridge – tracking and recognizing people across devices
      10. Flurry, InMobi and Sense Networks – mobile and location data
      11. Adyen, PAY.ON and others – payment and fraud detection
      12. MasterCard – fraud scoring and marketing data
  7. Summary of Findings and Discussion of its Societal Implications
    1. Ubiquitous data collection
    2. A loss of contextual integrity
    3. The transparency issue
    4. Power imbalances
    5. Power imbalances abused: systematic discrimination and sorting
    6. Companies hurt consumers and themselves
    7. Long term effects: the end of dignity?
    8. Final reflection: From voluntary to mandatory surveillance?
  8. Ethical Reflections on Personal Data Markets (by Sarah Spiekermann)
    1. A short Utilitarian reflection on personal data markets
    2. A short deontological reflection on personal data markets
    3. A short virtue ethical reflection on personal data markets
    4. Conclusion on ethical reflections
  9. Recommended Action
    1. Short- and medium term aspects of regulation
    2. Enforcing transparency from outside the “black boxes”
    3. Knowledge, awareness and education on a broad scale
    4. A technical and legal model for a privacy-friendly digital economy
  10. List of tables
  11. List of figures
  12. References

Mentions

yes

Quoted

  • Anna Fielder, Chair of Privacy International
  • Courtney gabrielson, International Association of Privacy Professionals (IAPP)

References

There are 677 footnoes, which are distinct from the references.
There are 211 references.

Separately filled.

Corporate Surveillance in Everyday Life | Cracked Labs


Corporate Surveillance in Everyday Life. How Companies Collect, Combine, Analyze, Trade, and Use Personal Data on BillionsWolfie Christl,; Cracked Labs, Vienna; 2017-06; 93 pages.

Teaser: <shrill>How thousands of companies monitor, analyze, and influence the lives of billions. Who are the main players in today’s digital tracking? What can they infer from our purchases, phone calls, web searches, and Facebook likes? How do online platforms, tech companies, and data brokers collect, trade, and make use of personal data?</shrill>

Table of Contents

  1. Background and Scope
  2. Introduction
  3. Relevant players within the business of personal data
    1. Businesses in all industries
    2. Media organizations and digital publishers
    3. Telecom companies and Internet Service Providers
    4. Devices and Internet of Things
    5. Financial services and insurance
    6. Public sector and key societal domains
    7. Future developments?
  4. The Risk Data Industry
    1. Rating people in finance, insurance and employment
    2. Credit scoring based on digital behavioral data
    3. Identity verification and fraud prevention
    4. Online identity and fraud scoring in real-time
    5. Investigating consumers based on digital records
  5. The Marketing Data Industry
    1. Sorting and ranking consumers for marketing
    2. The rise of programmatic advertising technology
    3. Connecting offline and online data
    4. Recording and managing behaviors in real-time
    5. Collecting identities and identity resolution
    6. Managing consumers with CRM, CIAM and MDM
  6. Examples of Consumer Data Broker Ecosystems
    1. Acxiom, its services, data providers, and partners
    2. Oracle as a consumer data platform
    3. Examples of data collected by Acxiom and Oracle
  7. Key Developments in Recent Years
    1. Networks of digital tracking and profiling
    2. Large-scale aggregation and linking of identifiers
    3. “Anonymous” recognition
    4. Analyzing, categorizing, rating and ranking people
    5. Real-time monitoring of behavioral data streams
    6. Mass personalization
    7. Testing and experimenting on people
    8. Mission creep – everyday life, risk assessment and marketing
  8. Conclusion
  9. Figures
  10. References

Mentions

Quoted

  • Omer Tene
  • Jules Polonetsky

Promotions

Yes.  A work this polished could be hid for long.

Summary

The web variant is summary material.

  1. Analyzing people
  2. Analyzing people in finance, insurance and healthcare
  3. Large-scale collection and use of consumer data
  4. Data brokers and the business of personal data
  5. Real-time monitoring of behaviors across everyday life
  6. Linking, matching and combining digital profiles
  7. Managing consumers and behaviors, personalization and testing
  8. Dragnet – everyday life, marketing data and risk analytics
  9. Mapping the commercial tracking and profiling landscape
  10. Towards a society of pervasive digital social control?

References

There are 601 footnotes, which are distinct from the references.
There are 102 of references

Previously filled.

Online Privacy and ISPs | Institute for Information Security & Privacy, Georgia Tech

Peter Swire, Justin Hennings, Alana Kirkland; Online Privacy and ISPs; a whitepaper; Institute for Information Security & Privacy, Georgia Tech; 2016-05; 131 pages.
Teaser: ISP Access to Consumer Data is Limited and Often Less than Access by Others

Authors
  • Peter Swire
    • Associate Director,
      The Institute for Information
      Security & Privacy at Georgia Tech
    • Huang Professor of Law,
      Georgia Tech Scheller College of Business
      Senior Counsel, Alston & Bird LLP
  • Justin Hemmings,
    • Research Associate,
      Georgia Tech Scheller College of Business
    • Policy Analyst
      Alston & Bird LLP
  • Alana Kirkland
    • Associate Attorney, Alston & Bird LLP

tl;dr → ISP < Media; ISPs are not omnipotent; ISPs see less than you think; Consumer visibility is mitigated by allowed usage patterns: cross-ISP, cross-device, VPN, DNS obfuscation, encryption.  Anyway, Facebook has it all and more.

Consumer profiling observation is already occurring by other means anyway.

<quote> In summary, based on a factual analysis of today’s Internet ecosystem in the United States, ISPs have neither comprehensive nor unique access to information about users’ online activity. Rather, the most commercially valuable information about online users, which can be used for targeted advertising and other purposes, is coming from other contexts. Market leaders are combining these contexts for insight into a wide range of activity on each device and across devices. </quote>

<translation> The other guys are already doing it, why stop ISPs? </translation>

ISP surveillanceObservation of consumers is neither Comprehensive, nor Unique

<quote> The Working Paper addresses two fundamental points. First, ISP access to user data is not comprehensive – technological developments place substantial limits on ISPs’ visibility. Second, ISP access to user data is not unique – other companies often have access to more information and a wider range of user information than ISPs. Policy decisions about possible privacy regulation of ISPs should be made based on an accurate understanding of these facts. </quote>

<view> It’s unargued why comprehensive or unique are bright-line standards of anything at all. </view>

Previously filled.

Mentions

Claims

  • ISPs < Media
    The dumb-pipe, bit-shoving, ISPs see less than media services, who see semantic richness.
  • Cross-device is the new nowadays.
  • Encryption is everywhere.

Definitions

Availability
  • a technical statement
  • contra “use” which is an action by a person
Cross-Device Tracking
Deterministic
Logged-In, Cross-Context Tracking
Probabilistic
Not Logged-In, Cross-Context Tracking
Cross-Device Tracking
  • Frequency Capping
  • Attribution
  • Improved Advertising Targeting
  • Sequenced Advertising
  • Tracking Simultaneity
Limits the use of “data” (facts about consumers)
  • at the point of collection
  • at the point of use
Location of a consumer
  • Coarse contra Precise
  • Current contra Historical

Summary

The document has both a Preface and an Executive Summary. so the journeyperson junior policy wonkmaker can approach the material at whatever level of complexity their time budget and training affords.

Preface

  • Technological Developments Place Substantial Limits on ISPs’ Visibility into Users’ Online Activity:
    1. From a single stationary device to multiple mobile devices and connections.
    2. Pervasive encryption.
    3. Shift in domain name lookup.
  • Non-ISPs Often Have Access to More and a Wider Range of User Information than ISPs:
    1. Non-ISP services have unique insights into user activity.
    2. Non-ISPs dominate in cross-context tracking.
    3. Non-ISPs dominate in cross-device tracking.

Executive Summary

  • Technological Developments Place Substantial Limits on ISPs’ Visibility into Users’ Online Activity:
    1. From a single stationary device to multiple mobile devices and connections.
    2. Pervasive encryption.
    3. Shift in domain name lookup.
  • Non-ISPs Often Have Access to More and a Wider Range of User Information than ISPs:
    1. Non-ISP services have unique insights into user activity.
      • social networks
      • search engines
      • webmail and messaging
      • operating systems
      • mobile apps
      • interest-based advertising
      • browsers
      • Internet video
      • e-commerce.
    2. Non-ISPs dominate in cross-context tracking.
    3. Non-ISPs dominate in cross-device tracking.

Table Of Contents

Online Privacy and ISPs: ISP Access to Consumer Data is Limited and Often Less than Access by Others

Summary of Contents:

  • Preface
  • Executive Summary
    • Appendix 1: Some Key Terms
  • Chapter 1: Limited Visibility of Internet Service Providers Into Users’ Internet Activity
    • Appendix 1: Encryption for Top 50 Web Site
    • Appendix 2: The Growing Prevalence of HTTPS as Fraction of Internet Traffic
  • Chapter 2: Social Networks
  • Chapter 3: Search Engines
  • Chapter 4: Webmail and Messaging
  • Chapter 5: How Mobile Is Transforming Operating Systems
  • Chapter 6: Interest-Based Advertising (“IBA”) and Tracking
  • Chapter 7: Browsers, Internet Video, and E-commerce
  • Chapter 8: Cross-Context Tracking
    • Appendix 1: Cross-Context Chart Citations
  • Chapter 9: Cross-Device Tracking
  • Chapter 10: Conclusion

Mentions

  • HTTPS
  • Interest-Based Advertising (IBA)
  • Tracking
  • Location
    • Coarse Location
    • Precise Location
  • Natural Language Conversation Robots (a.k.a. ‘bots)
    • Siri, Apple
    • Now, Google Now
    • Cortana, Microsoft

Argot

Also see page 124 of The Work.

  • Availability → contra Use
  • Big Data → data which is very big.
  • Broadband Internet Access Services → an ISP, but not a dialup service
    as used in the Open Internet Order, of the FCC, 2015-24, Appendix A.
  • Chat bot → <fancy>Personal Digital Assistance</fancy>
  • Cookie
  • CPNI → Customer Proprietary Network Information
    47 U.S.C. §222. Also, Section 222 are at 47 C.F.R.§ 64.2001 et seq.
  • Cross-Dontext
  • Cross-Device
  • DNS → Domain Name Service
  • DPI → Deep Packet Inspection
  • Edge Providers → smart pipes, page stuffing, click-baiting; e.g. Akamai, CloudFlare, CloudFront, etc.. exemplars.
  • End-to-End
    • Argument
    • Encryption
  • Factual Analysis → this means something different to lawyers contra engineers.
  • FCC → Federal Communications Commission
  • Form
    Form Autofill, a browser feature
  • FTC → Federal Trade Commission
  • FTT → Freedom To Tinker, a venue, an oped
  • GPS → Global Positioning System
  • HTTP → you know.
  • HTTPS → you know.
  • IBA → Interest-Based Advertising
  • IP → Internet Protocol
    • Address
  • IoT → Internet of Thingies Toys Unpatchables
  • IRL → <culture who=”The Youngs”>In Real Life</culture>
  • ISP → Internet Service Provider
  • Last Mile, of an ISP
  • Location
    • Coarse → “city”- “DMA”- or “country”-level
    • Precise → an in-industry definition exists
  • Metadata → indeed.
  • OBA → Online Behavioral Advertising
  • Open Internet Order, of the FCC.
  • OS → <ahem>Operating System</ahem>
  • Party System
    • First Party
    • [Second Party], no one cares.
    • Third Party
    • [Fourth Party]
  • Personal Information → the sacred stuff, the poisonous stuff
  • Personal Digital Assistant → a trade euphemism for NLP + command patterns for IVR; all the 1st-tier shops have one nowadays.
    • Siri → Apple
    • Now → Google
    • Cortana → Microsoft
  • Scanning
  • Section 222, see Title II
  • SSL → you mean TLS
  • Title II, of the Telecommunications Act.
    • Section 222,
  • Tracking
    • (Across-) Cross-Context
    • (Across-) Cross-Device
  • TLS → you mean SSL
  • UGC → User-Generated Content (unsupervised filth; e.g. comment spam)
  • URL → you know.
  • VPN → run one.
  • WiFi → for some cultural reason “wireless” turns into “Wireless Fidelity” and “WiFi”
  • Working Paper → are unreviewed work products..
  • Visibility → bookkeeping by the surveillor observer.

Actualities

References

Of course, it’s a legal-style policy whitepaper. Of course there are references; they are among the NN footnotes. In rough order of appearance in the work.

 

Persuasion and the other thing: A critique of big data methodologies in politics | Ethnography Matters

Molly Sauter; Persuasion and the other thing: A critique of big data methodologies in politics; In Ethnography Matters; 2017-05-24.

tl;dr → 3026 words. Big Data (which so is very big) is bad. The sphere is problematized. A problematic which situates the hegemons is synthesized via the dialectic. A mode of resistance is posited.

<ahem>… and by way of brief rebuttal: The Computers and The Establishment that owns & operates The Computers, their work inuring to the mutual benefit of them both, individually and severally, are smarter than all that (c.f. the trivial use of “grep -v”), and also the suggested modality of dissent violates the T&C which was previously freely given and binds & constrains individual future actions; its unilateral repudiation makes the performer at once dishonest, conflicted, and an outlaw who deserves no quarter; not in theory, not in practice, or under the reigning jurisdictional supervision (c.f. 18 U.S.C. Section 1001, as opined).</ahem>

Previously filled.

Mentions

  • Cambridge Analytica
  • Donald Trump
  • Brexit Campaign
  • Facebook
    • “likes”
    • targeted nudges
  • Mother Jones
  • The Guardian
  • SCL Group
  • Apple (Computer) Inc.

Claims

  • There is not enough consent (from the subjects)
  • <quote>Democracy shifts from a form of governance at least theoretically concerned with public debate and persuasion to one focused on private, opaque manipulation and emotional coercion.</quote>

Resistance

The obfuscation schemes, taxonomized in Brunton & Nissenbaum:

  • noisy bots
  • “like-farming,” i.e. spamming.
  • TrackMeNot
    a browser extension which generates abusive search query engines.
  • AdNauseam
    a browser extension which generates abusive click streams.
  • FaceCloak
    Something about storing data “off Facebook,” yet performing the data “on Facebook.”
  • Bayesian Flooding … sounds fancy; it means creating profile- & page- spam entries on Facebook.

Who

Unless otherwise noted persons are credited as “an activist.”

  • Finn Brunton
    with Helen Nissenbaum
  • Michal Kosinski, a bad guy in the pantheon
    with et al. as David Stillwell, Thore Graepel
  • Helen Nissenbaum
    with Finn Brunton (for symmetry)
  • Kelly Oliver
  • Molly Sauter
  • Zeynep Tufekci
  • Sara Marie Watson

Argot

… sounds fancy, and more than a little dangerous (<quote> cacklingly evil</quote>).  In rough order of appearance.

  • psychographics
  • algorithmic nudging
  • entitlements (<quote> held by advertisers, tech firms, and researchers who deploy big data analytics in support of political campaigns or other political projects </quote>
  • sense of entitlement
  • subjectivity (something about having agency; being such is good)
    objects (data objects); something about not having agency; being such is bad.
  • obfuscation
  • sabotage
    <quote>sabotaging the efficacy of the methodology in general, to resist attempts to be read, known, and manipulated.</quote>
  • emotional contagion; c.f. Facebook, an ”experiment,” 2014
  • nudge (contra shove)
  • algorithmic modeling → “opinions embedded in mathematics” [page 21, O'Neil].
  • otherness
  • knowability
  • digital shadow-selves
  • a paradoxical problem
    wow man, dig it … a paradox, a problem with a paradox, that’s like a paradox2.
  • data broker
  • entitlement of inference
    <quote>a certain entitlement of inference</quote>
    <quote>the entitlement of inference on display</quote>
  • influence techniques
    secret or opaque influence techniques
  • consent of the governed
    meaningful consent
  • inferential modeling collectssynthesizes non-disclosed information
  • opting out
    social media abstinence
  • data doppelganger
  • pervasive surveillance and modeling systems
  • obfuscation
    <quote>creates noise, either at the level of the platform or the individual profile</quote>

Referenced

Trajectory Recovery from Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data | Xu, Tu, Li, Zhang, Fu, Jin

Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, Depeng Jin; Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data; In Proceedings of the Conference on the World Wide Web (WWW); 2017-02-21 (2017-02-25); 10 pages; arXiv:1702.06270

tl;dr → probabilistic individuation from timestamped aggregated population location records.

Abstract

Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. Since releasing individual’s mobility records usually gives rise to privacy issues, datasets owners tend to only publish aggregated mobility data, such as the number of users covered by a cellular tower at a specific timestamp, which is believed to be sufficient for preserving users’ privacy. However, in this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals’ trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual’s trajectories from the aggregated mobility data without any prior knowledge. By conducting experiments on two real-world datasets collected from both mobile application and cellular network, we reveal that the attack system is able to recover users’ trajectories with accuracy about 73%~91% at the scale of tens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets. Through the investigation on aggregated mobility data, our work recognizes a novel privacy problem in publishing statistic data, which appeals for immediate attentions from both academy and industry.

Promotions

References

  1. R. Wang, M. Xue, K. Liu, et al. Data-driven privacy analytics: A wechat case study in location-based social networks. In Wireless Algorithms, Systems, and Applications. Springer, 2015.
  2. Apple’s commitment to your privacy.
  3. V. D. Blondel, M. Esch, C. Chan, et al. Data for development: the D4D challenge on mobile phone data. arXiv:1210.0137, 2012.
  4. G. Acs and C. Castelluccia. A case study: privacy preserving release of spatio-temporal density in Paris. In Proceedings of the ACM Conference of the Special Interest Group on Knowledge D-something and D-Something (SIGKDD). ACM, 2014.
  5. China telcom’s big data products.
  6. C. Song, Z. Qu, N. Blumm. Limits of predictability in human mobility. In Science, 2010.
  7. S. Isaacman, R. Becker, R. Cáceres, et al. Ranges of human mobility in Los Angeles and New York. In Proceedings of the IEEE Workshops on Pervasive Computing and Communications (PERCOM). IEEE, 2011.
  8. S. Isaacman, R. Becker, R. Cáceres, et al. Human mobility modeling at metropolitan scales. In In Proceedings of the ACM Conference on Mobile Systems (MOBISYS). ACM, 2012.
  9. M. Seshadri, S. Machiraju, A. Sridharan, et al. Mobile call graphs: beyond power-law and lognormal distributions. In Proceedings of the ACM Conference on Knowledge Discovery? and Discernment? (KDD). ACM, 2008.
  10. Y. Wang, H. Zang, M. Faloutsos. Inferring cellular user demographic information using homophily on call graphs. In Proceedings of the IEEE Workshop on Computer Communications (INFOCOM) IEEE, 2013.
  11. A. Wesolowski, N. Eagle, A. J. Tatem, et al. Quantifying the impact of human mobility on malaria. In Science, 2012.
  12. M. Saravanan, P. Karthikeyan, A. Aarthi. Exploring community structure to understand disease spread and control using mobile call detail records. NetMob D4D Challenge, 2013. Probably there’s a promotional micro-site for this.
  13. R. W. Douglass, D. A. Meyer, M. Ram, et al. High resolution population estimates from telecommunications data. In EPJ Data Science, 2015.
  14. H. Wang, F. Xu, Y. Li, et al. Understanding mobile traffic patterns of large scale cellular towers in urban environment. In Proceedings of the ACM Internet Measurement Conference (IMC). ACM, 2015.
  15. L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002.
  16. Y. de Montjoye, L. Radaelli, V. K. Singh, et al. Unique in the shopping mall: On the reidentifiability of credit card metadata. In Science, 2015.
  17. H. Zang and J. Bolot. Anonymization of location data does not work: A large-scale measurement study. In Proceedings of the ACM Conference on Mobile Communications (Mobicom). ACM, 2011.
  18. M. Gramaglia and M. Fiore. Hiding mobile traffic fingerprints with glove. In Proceedings of the ACM Conference CoNEXT, 2015.
  19. A.-L. Barabasi. The origin of bursts and heavy tails in human dynamics. In Nature, 2005.
  20. A. Machanavajjhala, D. Kifer, J. Gehrke, et al. l-Diversity: Privacy beyond k-Anonymity. In Transactions on Knowledge Doodling? and Deliverance? (TKDD), 2007.
  21. Y. de Montjoye, C. A. Hidalgo, M. Verleysen, et al. Unique in the crowd: The privacy bounds of human mobility. In Scientific Reports, 2013.
  22. G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, 1998.
  23. H. W. Kuhn. The Hungarian Method for the Assignment Problem. In Naval Research Logistics Quarterly, 1955.
  24. O. Abul, F. Bonchi, M. Nanni. Anonymization of moving objects databases by clustering and perturbation. In Information Systems, 2010.
  25. Pascal Welke, Ionut Andone, Konrad Blaszkiewicz, Alexander Markowetz. Differentiating smartphone users by app usage. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 519–523. ACM, 2016.
  26. Lukasz Olejnik, Claude Castelluccia, Artur Janc. Why Johnny Can’t Browse in Peace: On the uniqueness of web browsing history patterns. In Proceedings of the 5th Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs), 2012.
  27. M. C. Gonzalez, C. A. Hidalgo, A.-L. Barabasi. Understanding individual human mobility patterns. In Nature, 2008.
  28. C. Song, T. Koren, P. Wang, et al. Modelling the scaling properties of human mobility. In Nature Physics, 2010.
  29. Y. Liu, K. P. Gummadi, B. Krishnamurthy, et al. Analyzing Facebook Privacy Settings: User Expectations vs. Reality. In Proceedings of the ACM Internet Measurement Conference (IMC). ACM, 2011.
  30. B. Krishnamurthy and C. E. Wills. Generating a privacy footprint on the Internet. In Proceedings of the ACM Internet Measurement Conference
  31. S. Le B., C. Zhang, A. Legout, et al. I know where you are and what you are sharing: exploiting P2P communications to invade users’ privacy. In Proceedings of the ACM Internet Measurement Conference (IMC). ACM, 2011.
  32. S. Liu, I. Foster, S. Savage, et al. Who is. com? learning to parse WHOIS records. In Proceedings of the ACM Internet Measurement Conference (IMC). ACM, 2015.
  33. H. Kido, Y. Yanagisawa, T. Satoh. Protection of location privacy using dummies for location-based services. In Proceedings of the IEEE International Conference on (Mountain?) DEW (ICDEW). IEEE, 2005.
  34. A. Monreale, G. L. Andrienko, N. V. Andrienko, et al. Movement data anonymity through generalization. In Transactions on Data Privacy, 2010.
  35. K. Sui, Y. Zhao, D. Liu, et al. Your trajectory privacy can be breached even if you walk in groups. In Proceedings of the IEEE/ACM International Workshop on Quality of Service (IWQoS), 2016.
  36. Y. Song, D. Dahlmeier, S. Bressan. Not so unique in the crowd: a simple and effective algorithm for anonymizing location data. In PIR@ SIGIR, 2014.
  37. S. Garfinkel. Privacy protection and RFID. In Ubiquitous and Pervasive Commerce. Springer, 2006.
  38. J. Domingo-Ferrer and R. Trujillo-Rasua. Microaggregation-and permutation-based anonymization of movement data. In Information Sciences, 2012.
  39. Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, Salil Vadhan. Robust Traceability From Trace Amounts. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), , pages 650–669. IEEE, 2015.

Previously filled.

WebRTC and STUN for intra-LAN exploration & end-user tracking

WebRTC

  • WebRTC, promotional site
  • Availabilities
    all the browsers that matter

    • Android
    • Chrome (Linux, Android, Windows)
    • Firefox
    • Opera
    • Safari (iOS)

STUN

Related

Standards

  • RFC 7350Datagram Transport Layer Security (DTLS) as Transport for Session Traversal Utilities for NAT (STUN); Petit-Huguenin, Salgueiro; IETF; 2014-08.
  • RFC 7064URI Scheme for the Session Traversal Utilities for NAT (STUN) Protocol; Nandakumar, Salgueiro, Jones, Petit-Huguenin; IETF; 2013-11.
  • RFC 5928Traversal Using Relays around NAT (TURN) Resolution Mechanism; Petit-Huguenin; IETF; 2010-08.
  • RFC 5389Session Traversal Utilities for NAT (STUN); Rosenberg, Mahy, Matthews, Wing; IETF; 2008-10.
    (obsoleted)

    • RFC 3489STUN – Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs); Rosenberg, Weinberger, Huitema, Mahy; 2003-03.

In Jimi Wales’ Wiki.

Implementation

Tracking

In archaeological order

Leaking


665909webrtc WebRCT Tracking; In Bugzilla of Mozilla; 2011-06-21 →2016-01-11; Closed as INVALID


Some droid using the self-asserted identity token cchen; How to Stop WebRTC Local IP Address Leaks on Google Chrome and Mozilla Firefox While Using Private IPs; In Privacy Online Forums; 2015-01→2015-03.

Mentions

  • Availability
    of the problem (not of WebRTC in general)

    • Chrome of Google
      • Windows
    • Firefox of Mozilla
      • Unclear, perhaps Windows only
    • Internet Explorer of Microsoft
      WebRTC is not available at all.
    • Opera of Mozilla
      • Unclear
    • Safari of Apple
      WebRTC is not available except through a plugin
    • Unavailable
      • Chrome of Google
        • OS/X
        • Android
      • Linux at all
        not clear; not mentioned at all.
  • Blocking
    • Chrome of Google
    • Firefox of Mozilla
      • Production
        • about:config
        • media.peerconnection.enabled set to true (default true)
      • Development
        same

        • Canary
        • Nightly
        • Bowser
    • Opera of Opera
  • API Directory
    • voice calls
    • video chats
    • p2p file sharing

Configuration

  • Chrome
    default is available and active
  • Firefox
    • about:config
    • media.peerconnection.enabled set to true (default true)
  • Opera
    only when configured, with a plugin, to run Google Chrome extensions

Demonstration

webrtc-ips, a STUN & WebRTC test rig

  • diafygi/webrtc-ips
  • via on-page JavaScript, makes latent requests to certain STUN servers.
  • Firefox 34 → Does. Not. Work.
  • Fails with
    Error: RTCPeerConnection constructor passed invalid RTCConfiguration - missing url webrtc-ips:58

Argot

  • Private Internet Access (PIA)
  • Real-Time-Communication (RTC)
  • Virtual Private Network (VPN)
  • WebRTC

Previously

In Privacy Online Forums:

Referenced

  • 2013
  •  Since WebRTC uses javascript requests to get your IP address, users of NoScript or similar services will not leak their IP addresses.

Via: backfill.


Firefox

  • about:config
  • media.peerconnection.enabled set to true (default true)

Web Privacy Census | Altaweel, Good, Hoofnagle

Ibrahim Altaweel, Nathaniel Good, Chris Jay Hoofnagle; Web Privacy Census; In Technology Science; 2015-12-15.

tl;dr → there are lots of (HTML4) cookies; cookies are for tracking; cookies are bad. factoids are exhibited.

Abstract

Most people may believe that online activities are tracked more pervasively now than they were in the past. In 2011, we started surveying the online mechanisms used to track people online (e.g., HTTP cookies, Flash cookies and HTML5 storage). We called this our Web Privacy Census. We repeated the study in 2012. In this paper, we update the study to 2015.

Mentions

  • Universe
    • Quantcast
    • “top 1 million”
  • Attack
    • Firefox 39
    • OpenWPM
  • Client
    • HTML4 Cookies
    • HTML5 Storage
    • Flash
  • Use Cases
    indistinguishable in the census method

    • Analytics
    • Tracking (Trak-N-Targ)
    • Conversion
    • Personalization
    • Security

References

Revisiting the Uniqueness of Simple Demographics in the US Population | Philippe Golle

Philippe Golle; Revisiting the Uniqueness of Simple Demographics in the US Population; In Proceedings of the Workshop on Privacy in the Electronic Society (WPES); 2006-10-30; 4 pages.

Abstract

ccording to a famous study [10] of the 1990 census data, 87% of the US population can be uniquely identified by gender, ZIP code and full date of birth. This short paper revisits the uniqueness of simple demographics in the US population based on the most recent census data (the 2000 census). We offer a detailed, comprehensive and up-to-date picture of the threat to privacy posed by the disclosure of simple demographic information. Our results generally agree with the findings of [10], although we find that disclosing one’s gender, ZIP code and full date of birth allows for unique identification of fewer individuals (63% of the US population) than reported in [10]. We hope that our study will be a useful reference for privacy researchers who need simple estimates of the comparative threat of disclosing various demographic data.

Simple Demographics Often Identify People Uniquely | Latanya Sweeney

Latanya Sweeney; Simple Demographics Often Identify People Uniquely; Data Privacy Working Paper 3; Carnegie Mellon University; Pittsburgh, PA; 2000; 34 pages.

Abstract

In this document, I report on experiments I conducted using 1990 U.S. Census summary data to determine how many individuals within geographically situated populations had combinations of demographic values that occurred infrequently. It was found that combinations of few characteristics often combine in populations to uniquely or nearly uniquely identify some individuals. Clearly, data released containing such information about these individuals should not be considered anonymous. Yet, health and other person-specific data are publicly available in this form. Here are some surprising results using only three fields of information, even though typical data releases contain many more fields. It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.

Related

(but different)

L. Sweeney; Uniqueness of Simple Demographics in the U.S. Population; Data Privacy Lab White Paper Series LIDAP-WP4; School of Computer Science, Carnegie Mellon University, Pittsburgh, PA; 2000; 34 pages; abstract; catalog.

Via: backfill

RFC 7624 – Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement

RFC 7624Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement; R. Barnes, B. Schneier, C. Jennings, T. Hardie, B. Trammell, C. Huitema, D. Borkmann; IETF; 2015-08.

tl;dr
  • state-level actors
    (police- & military-focused)
  • some mention of adtrade tracking
    (is passive, pervasive & persistent but nominally T&C, N&C, etc.)

Abstract

Since the initial revelations of pervasive surveillance in 2013, several classes of attacks on Internet communications have been discovered. In this document, we develop a threat model that describes these attacks on Internet confidentiality. We assume an attacker that is interested in undetected, indiscriminate eavesdropping. The threat model is based on published, verified attacks.

Table of Contents

1. Introduction
2. Terminology
3. An Idealized Passive Pervasive Attacker
3.1. Information Subject to Direct Observation
3.2. Information Useful for Inference
3.3. An Illustration of an Ideal Passive Pervasive Attack
3.3.1. Analysis of IP Headers
3.3.2. Correlation of IP Addresses to User Identities
3.3.3. Monitoring Messaging Clients for IP Address Correlation
3.3.4. Retrieving IP Addresses from Mail Headers
3.3.5. Tracking Address Usage with Web Cookies
3.3.6. Graph-Based Approaches to Address Correlation
3.3.7. Tracking of Link-Layer Identifiers
4. Reported Instances of Large-Scale Attacks
5. Threat Model
5.1. Attacker Capabilities
5.2. Attacker Costs
6. Security Considerations
7. References
7.1. Normative References
7.2. Informative References
IAB Members at the Time of Approval
Acknowledgements
Authors’ Addresses

Mentioned

Concepts

  • Encryption
  • Snowden
  • National Security Agency (NSA)
    • FOXACID
    • PRISM
    • QUANTUM
    • XKEYSCORE

<quote>
3.3.2. Correlation of IP Addresses to User Identities

The correlation of IP addresses with specific users can be done in various ways. For example, tools like reverse DNS lookup can be used to retrieve the DNS names of servers. Since the addresses of servers tend to be quite stable and since servers are relatively less numerous than users, an attacker could easily maintain its own copy of the DNS for well-known or popular servers to accelerate such lookups.

On the other hand, the reverse lookup of IP addresses of users is generally less informative. For example, a lookup of the address currently used by one author’s home network returns a name of the form “c-192-000-002-033.hsd1.wa.comcast.net”. This particular type of reverse DNS lookup generally reveals only coarse-grained location or provider information, equivalent to that available from geolocation databases.

In many jurisdictions, Internet Service Providers (ISPs) are required to provide identification on a case-by-case basis of the “owner” of a specific IP address for law enforcement purposes. This is a reasonably expedient process for targeted investigations, but pervasive surveillance requires something more efficient. This provides an incentive for the attacker to secure the cooperation of the ISP in order to automate this correlation.

Even if the ISP does not cooperate, user identity can often be obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used to retrieve mail from mail servers, while a variant of SMTP is used to submit messages through mail servers. IMAP connections originate from the client, and typically start with an authentication exchange in which the client proves its identity by answering a password challenge. The same holds for the SIP protocol [RFC3261] and many instant messaging services operating over the Internet using proprietary protocols.

The username is directly observable if any of these protocols operate in cleartext; the username can then be directly associated with the source address.

3.3.4. Retrieving IP Addresses from Mail Headers

SMTP [RFC5321] requires that each successive SMTP relay adds a “Received” header to the mail headers. The purpose of these headers is to enable audit of mail transmission, and perhaps to distinguish between regular mail and spam. Here is an extract from the headers of a message recently received from the perpass mailing list:

Received: from 192-000-002-044.zone13.example.org (HELO ?192.168.1.100?) (xxx.xxx.xxx.xxx)
    by lvps192-000-002-219.example.net
    with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 2013 21:47:14 +0100
Message-ID: >526D7BD2.7070908@example.org>
Date: Sun, 27 Oct 2013 20:47:14 +0000
From: Some One <some.one@example.org>

This is the first “Received” header attached to the message by the first SMTP relay; for privacy reasons, the field values have been anonymized. We learn here that the message was submitted by “Some One” on October 27, from a host behind a NAT (192.168.1.100) [RFC1918] that used the IP address 192.0.2.44. The information remained in the message and is accessible by all recipients of the perpass mailing list, or indeed by any attacker that sees at least one copy of the message.

An attacker that can observe sufficient email traffic can regularly update the mapping between public IP addresses and individual email identities. Even if the SMTP traffic was encrypted on submission and relaying, the attacker can still receive a copy of public mailing lists like perpass.

3.3.5. Tracking Address Usage with Web Cookies

Many web sites only encrypt a small fraction of their transactions. A popular pattern is to use HTTPS for the login information, and then use a “cookie” to associate following cleartext transactions with the user’s identity. Cookies are also used by various advertisement services to quickly identify the users and serve them with “personalized” advertisements. Such cookies are particularly useful if the advertisement services want to keep tracking the user across multiple sessions that may use different IP addresses.

As cookies are sent in cleartext, an attacker can build a database that associates cookies to IP addresses for non-HTTPS traffic. If the IP address is already identified, the cookie can be linked to the user identify. After that, if the same cookie appears on a new IP address, the new IP address can be immediately associated with the predetermined identity.

3.3.6. Graph-Based Approaches to Address Correlation

An attacker can track traffic from an IP address not yet associated with an individual to various public services (e.g., web sites, mail servers, game servers) and exploit patterns in the observed traffic to correlate this address with other addresses that show similar patterns. For example, any two addresses that show connections to the same IMAP or webmail services, the same set of favorite web sites, and game servers at similar times of day may be associated with the same individual. Correlated addresses can then be tied to an individual through one of the techniques above, walking the “network graph” to expand the set of attributable traffic.

3.3.7. Tracking of Link-Layer Identifiers

Moving back down the stack, technologies like Ethernet or Wi-Fi use MAC (Media Access Control) addresses to identify link-level destinations. MAC addresses assigned according to IEEE 802 standards are globally unique identifiers for the device. If the link is publicly accessible, an attacker can eavesdrop and perform tracking. For example, the attacker can track the wireless traffic at publicly accessible Wi-Fi networks. Simple devices can monitor the traffic and reveal which MAC addresses are present. Also, devices do not need to be connected to a network to expose link-layer identifiers. Active service discovery always discloses the MAC address of the user, and sometimes the Service Set Identifiers (SSIDs) of previously visited networks. For instance, certain techniques such as the use of “hidden SSIDs” require the mobile device to broadcast the network identifier together with the device identifier. This combination can further expose the user to inference attacks, as more information can be derived from the combination of MAC address, SSID being probed, time, and current location. For example, a user actively probing for a semi-unique SSID on a flight out of a certain city can imply that the user is no longer at the physical location of the corresponding AP. Given that large-scale databases of the MAC addresses of wireless access points for geolocation purposes have been known to exist for some time, the attacker could easily build a database that maps link-layer identifiers and time with device or user identities, and use it to track the movement of devices and of their owners. On the other hand, if the network does not use some form of Wi-Fi encryption, or if the attacker can access the decrypted traffic, the analysis will also provide the correlation between link-layer identifiers such as MAC addresses and IP addresses. Additional monitoring using techniques exposed in the previous sections will reveal the correlation between MAC addresses, IP addresses, and user identity. For instance, similarly to the use of web cookies, MAC addresses provide identity information that can be used to associate a user to different IP addresses.

</quote>

References

Normative

  • RFC 6973Privacy Considerations for Internet Protocols, A. Cooper, H. Tschofenig, B. Aboba, J. Peterson, J. Morris, M. Hansen, R. Smith, DOI 10.17487/RFC6973, 2013-07.

Informative

Mostly newspaper articles (expos’ees) & techreport whitepapers.

  • RFC 1035Domain names – implementation and specification, P. Mockapetris, STD 13, RFC 1035, doi:10.17487/RFC1035, 1987-11.
  • RFC 1918Address Allocation for Private Internets, Y. Rekhter, B. Moskowitz, D. Karrenberg, G. de Groot, E. Lear, BCP 5, RFC 1918, doi:10.17487/RFC1918, 1996-02.
  • RFC 1939Post Office Protocol – Version 3, J. Myers, M. Rose, STD 53, RFC 1939, doi:10.17487/RFC1939, 1996-05.
  • RFC 3261SIP: Session Initiation Protocol, J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, E. Schooler, RFC 3261, doi:10.17487/RFC3261, 2002-06.
  • RFC 3365Strong Security Requirements for Internet Engineering Task Force Standard Protocols, J. Schiller, BCP 61, RFC 3365, doi:10.17487/RFC3365, 2002-08.
  • RFC 3501INTERNET MESSAGE ACCESS PROTOCOL – VERSION 4rev1, M. Crispin, RFC 3501, doi:10.17487/RFC3501, 2003-03.
  • RFC 4033DNS Security Introduction and Requirements, R. Arends, Austein, R., Larson, M., Massey, D., and S. Rose, RFC 4033, doi:10.17487/RFC4033, 2005-03.
  • RFC 4303IP Encapsulating Security Payload (ESP), S. Kent, RFC 4303, doi:10.17487/RFC4303, 2005-12.
  • RFC 4949Internet Security Glossary, Version 2, R. Shirey, FYI 36, RFC 4949, doi:10.17487/RFC4949, 2007-08.
  • RFC 5246The Transport Layer Security (TLS) Protocol Version 1.2, T. Dierks, E. Rescorla, RFC 5246, doi:10.17487/RFC5246, 2008-08.
  • RFC 5321Simple Mail Transfer Protocol, J. Klensin, RFC 5321, doi:10.17487/RFC5321, 2008-10.
  • RFC 6962Certificate Transparency, B. Laurie, A. Langley, E. Kasper, RFC 6962, doi:10.17487/RFC6962, 2013-06.
  • RFC 7011Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information, B. Claise, B. Trammell, (editors), P. Aitken, STD 77, RFC 7011, doi:10.17487/RFC7011, 2013-09.
  • RFC 7258Pervasive Monitoring Is an Attack, S. Farrell, H. Tschofenig, BCP 188, RFC 7258, doi:10.17487/RFC7258, 2014-05.

Via: backfill.

Header Enrichment or ISP Enrichment? Emerging Privacy Threats in Mobile Networks | Vallina-Rodriguez, Sundaresan, Kreibich, Paxson

Narseo Vallina-Rodriguez, Srikanth Sundaresan, Christian Kreibich, Vern Paxson; Header Enrichment or ISP Enrichment? Emerging Privacy Threats in Mobile Networks; In Proceedings of the ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox 2015, huh? now you’re just being silly); 2015-08-17; 6 pages; landing.

Abstract

HTTP header enrichment allows mobile operators to annotate HTTP connections via the use of a wide range of request headers. Operators employ proxies to introduce such headers for operational purposes, and—as recently widely publicized—also to assist advertising programs in identifying the subscriber responsible for the originating traffic, with significant consequences for the user’s privacy. In this paper, we use data collected by the Netalyzr network troubleshooting service over 16 months to identify and characterize HTTP header enrichment in modern mobile networks. We present a timeline of HTTP header usage for 299 mobile service providers from 112 countries, observing three main categories:

  1. unique user and device identifiers (e.g., IMEI and IMSI)
  2. headers related to advertising programs, and
  3. headers associated with network operations.

Mentions

  • HTTP header enrichment
  • Netalyzr
    • Netalyzer-for-Android
  • Verizon Precision Marketingt Insights
  • The IETF’s Service Function Chaining (SFC) standards are vague about whether injected headers are good or bad (should be removed).
  • Data
    • Collected: 2013-11 → 2015-03.
    • 112 countries
    • 299 operators
  • CRAWDAD
  • Belief: no M?NO is yet cracking TLS to insert HTTP headers into the encrypted stream.
  • Suggested as an ID-less methods of identification: device-unique allocation of the (routable) IPv6 space to identify the device, in addition to routing to it.
  • RFC 7239Forwarded HTTP Extension; A. Peterson, M. Milsson (Opera); IETF; 2014-06.
  • Cessation Timeline
    • 2014-10 → Vodaphone (ZA) has ceased their practices in 2014-10, nothing to see there, now.
    • 2014-11 → AT&T has ceased their practices 2014-11.
    • 2015-03 → Verion was not respecting opt-out (as evidenced by not inserting the X-UIDH header) through 2015-03.
  • Continuation
    • Verion continues the X-UIDH header insertion.
  • The X-Forwarded-For header carries extra freight in T-Mobile (DE)
  • Carrier-Grade NAT (CGN) at 100.64.0.0/10 per RFC 6598IANA-Reserved IPv4 Prefix for Shared Address Space (2012-04)

Headers

Table 1 & Table 2; Table 3 (not shown)

HTTP Header Operator Country Estimated Purpose
x-up-calling-line-id Vodacom ZA Phone Number
x-up-nai
x-up-vodacomgw-subid
msisdn Orange JO MISDN
x-nokia-msisdn Smart PH
tm_user-id Movistar ES Subscriber ID
x-up-subno
x-up-3gpp-imeisv Vodacom ZA IMEI
lbs-eventtime Smarttone HK Timestamp
lbs-zoneid Location
x-acr AT&T US unstated, an identifier
x-amobee-1 Airtel IN
x-amobee-2 Singtel SG
x-uidh Verizon US
x-vf-acr Vodacom ZA
Vodafone NL

Argot

  • Access Point Name (APN)
  • GPRS
  • HTTP
  • IMSI
  • IMEI
  • J2ME
  • Location-Based Services (LBS)
  • Mobile Country Code (MCC)
  • Mobile Network Code (MNC)
  • Mobile Network Operator (MNO)
  • Mobile Virtual Network Operator (MVNO)
  • MSISDN
  • Hong Kong Metro (subway) (MTR)
  • Service Function Chaining (SFC)
  • SIM
  • Transport-Layer Security (TLS)
  • Unique Identifier (UID); contra the specific UUID or GUID
  • Virtual Private Network (VPN)
  • WAP

References

A significant number of newpaper articles, vulgarizations & bloggist opinements.

AT&T’s plan to watch your Web browsing—and what you can do about it | Ars Technica

AT&T’s plan to watch your Web browsing—and what you can do about it; ; In Ars Technica; 2015-03-27.
Teaser: Want to opt out? It could cost up to $744 extra per year.

tl;dr => You have to pay to opt out. Get a VPN.

Mentions

  • the article
    • is discursive
    • lots of backfill from the pseudonymous commentariat at Ars Forums.
    • tl;dr => the fee is there, either pay it or get a VPN.
  • GigaPower, a product name
    Cost

    • startup fee
    • + $99/month for DSL-type service
    • + $29/month for tracking-limitation.
    • ⨉ $30% various taxes & user fees
    • = ~$175-$200/month
  • Relevant Advertising, a product name
  • Alternates
  • Precedents
    • Charter, ad insertion, tracking&targeting; abandoned 2008
    • CMA Communication, ad insertion; reported circa 2013-04.
  • Regulatory
    • Something vague about U.S. wiretap law; but opt-in consent trumps this.
    • Something vague about EU data law; but …

Quoted

Policies

  • <quote>AT&T’s best pricing may not be available in cities where it doesn’t compete against Google Fiber. In Dallas, where Google Fiber hasn’t arrived, AT&T was charging $120 a month for gigabit service and still requiring the customer to opt in to Internet Preferences.</quote>
  • <quote>AT&T describes Internet Preferences as “opt-in,” but its website advertises the lower price without mentioning the traffic scanning unless you click “See offer details.”</quote> An actuality of the AT&T consumer acquisition screens is exhibited.
  • <quote>AT&T Internet Preferences works independently of your browser’s privacy settings regarding cookies, do-not-track, and private browsing, If you opt-in to AT&T Internet Preferences, AT&T will still be able to collect and use your Web browsing information independent of those settings. Using the IP address assigned to each GigaPower account, AT&T scans for your AT&T Internet Preferences election, AT&T will treat your Internet browsing activity in accordance with your election. If you chose to participate in the AT&T Internet Preferences program, your Internet traffic is routed to AT&T’s Internet Preferences Web browsing and analytics platform.</quote> attributed to AT&T, either to a document or a speaker (unclear).

Promotions

  • Privacy Is Becoming a Premium Service; David Auerbach; In Slate; 2015-03-31.
    Teaser: AT&T wants customers to pay the company not to spy on them. And it’s not an outlier.
    Teaser: AT&T Gigapower: The company wants you to pay it not to sell your data
    Mentions

    • wholly derivative of the Ars Technica piece; somewhat more cogent.

Previously

Original reporting, a month earlier, in archaeological order (derivatives on top, original work below).

  • AT&T’s Offer: Share Your Data for Personalized Ads, or Pay More; Natasha Singer; In The New York Times (NYT); 2015-02-18.
    Mentioned

    • Original reporting the WSJ piece, nearby
    • Quoted
      • Gretchen Schultz, press relations, AT&T
      • Jonathan Mayer, activist, Stanford University (still a graduate student?)
  • AT&T Offers Data Privacy – for a Price; Elizabeth Dwoskin, Thomas Gryta; In The Wall Street Journal (WSJ); 2015-02-10.
    Mentioned

    • <quote>AT&T’s new service uses searches terms entered, Web pages visited, and links clicked. The tracking remains in effect even if you clear cookies, use an ad block program, or switch on a browser’s do-not-track settings. The company uses the data it collects to help advertisers target ads on Web pages, email messages or direct mail.</quote>
    • Pricing
      • + $70/month for service
      • + $29/month for “opt out”
      • ✕ 30% user fees & taxes
    • Framing (by AT&T press relations)
      <quote>not as a charge to people who opted out of tracking but as a discount to those who didn’t. “We can offer a lower price to customers participating in AT&T Internet Preferences because advertisers will pay us for the opportunity to deliver relevant advertising and offers tailored to our customer’s interests,”</quote>
    • Quoted
      • Jonathan Mayer, activist, Stanford University
      • Marc Rotenberg, President & Executive Director, Electronic Privacy Information Center (EPIC).
    • Similar
      • AT&T
      • Verizon
    • Concept
      • <quote>The companies stored hidden, undeletable tracking codes on customers’ phones. </quote>
      • <quote>there was no way to opt out of these so-called super-cookies, which let the service providers track them across all devices and monitor device location at all times.</quote>

Referenced

XRay: Increasing the Web’s Transparency with Differential Correlation | Lecuyer, Ducoffe, Lan, Papancea, Petsios, Spahn, Chaintreau, Geambasu

Abstract

Today’s Web services – such as Google, Amazon, and Facebook – leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose.

To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay’s core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.

Mentioned

  • Evaluated
    • Amazon
    • Gmail
    • YouTube
  • Criteria
    • How accurate are XRay’s inference models?
    • How does XRay scale with input size?
    • Does input matching reduce overlap?
    • How useful is XRay in practice?
  • Scope
    • Aspiration
      • Profile Targeting
      • Contextual Targeting
      • Behavioral Targeting
    • Practical
      • Email targeting
      • Prohibited subject matter targeting.
  • Components
    • Differential Correlation Engine
    • Shadow Account Manager
    • Audited Web Service
    • Browser Plugin
  • Implementation
    • Browser Plugin
    • Ruby
      • service => 3KLOC
      • service => 0.5KLOC

Background

Actualities

References

  1. Adblock Plussurf the web without annoying ads!
  2. I. Amazon. Amazon taxonomy.
  3. P. Barford, I. Canadi, D. Krushevskaja, Q. Ma, S. Muthukrishnan. AdScape: Harvesting, Analyzing Online Display Ads. In Proceedings of the 23nd International Conference on World Wide Web (WWW). 2014.
  4. B. Beizer. Black-Box Testing. Techniques for Functional Testing of Software, Systems. John Wiley & Sons, May 1995.
  5. D. Boneh, G. Crescenzo, R. Ostrovsky, G. Persiano. Public Key Encryption with Keyword Search. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 506–522. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
  6. C. Castelluccia, M. A. Kaafar, M. Tran. Betrayed by Your Ads! Reconstructing User Profiles from Targeted Ads. In Proceedings of the 12th International Conference on Privacy Enhancing Technologies (PETS), 2012.
  7. W. Cheng, Q. Zhao, B. Yu, S. Hiroshige. Tainttrace: Efficient Flow Tracing with Dynamic Binary Rewriting. In Proceedings of the 11th IEEE Symposium on Computers, Communications. IEEE Computer Society, 2006.
  8. Chrome Web Store – Collusion, for chrome.
  9. V. Dave, S. Guha, Y. Zhang. Measuring, Fingerprinting Click-Spam in Ad Networks. In Proceedings of ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, Protocols for Computer Communication. 2012-09.
  10. N. Diakopoulos. Algorithmic Accountability Reporting: On the Investigation of Black Boxes. Tow Center for Digital Journalism, Columbia University. February, 2014.
  11. R. Dingledine, N. Mathewson, P. Syverson. Tor: The Second-Generation Onion Router. Technical Report, 2004.
  12. W. Enck, P. Gilbert, B. gon Chun, L. P. Cox, J. Jung, P. McDaniel, A. N. Sheth. TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones. In Proceedings of the USENIX Symposium on Operating Systems Design, Implementation (OSDI), 2010.
  13. M. Fredrikson, B. Livshits. RePriv: Re-imagining Content Personalization, In-browser Privacy. In Proceedings of the 2011 IEEE Symposium on Security, Privacy (SP), pages 131–146, 2011.
  14. R. Geambasu, T. Kohno, A. Levy, H. M. Levy. Vanish: Increasing Data Privacy with Self-Destructing Data. In Proceedings of USENIX Security, 2009.
  15. C. Gentry. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the ACM Symposium on Theory of Computing (STOC), 2009.
  16. D. B. Giffin, A. Levy, D. Stefan, D. Terei. Hails: Protecting Data Privacy in Untrusted Web Applications. In Proceedings of the 10th Symposium on Something…, 2012.
  17. I. Google. Adsense categories.
  18. V. Goyal, O. Pandey, A. Sahai, B. Waters. Attribute-based Encryption for Fine-Grained Access Control of Encrypted Data. In Proceedings of the ACM Conference on Computer, Communications Security (CCS), 2006.
  19. Some Cub Reporter. Snapchat’s expired snaps are not deleted, just hidden. In The Guardian. DATE?
  20. S. Guha, B. Cheng, P. Francis. Challenges in Measuring Online Advertising Systems. In Proceedings of the 10th Annual Internet Measurement Conference (IMC). 2010-11.
  21. A. Hannak, P. Sapiezynski, A. M. Kakhki, B. Krishnamurthy, D. Lazer, A. Mislove, C. Wilson. Measuring Personalizationof Web search. In Proceedings of the 22nd International Conference on World Wide Web (WWW). 2013-05.
  22. A. L. Hughes, L. Palen. Twitter Adoption, Use In Mass Convergence, Emergency Events. In International Journal of Emergency Management, 2009.
  23. S. Jeff Gould. Google admits data mining student emails in its free education apps. In Their Blog. safegov.org. 2014-01-31.
  24. Some Cub Reporter. On Orbitz, MAC Users Steered to Pricier Hotels; In The Wall Street Journal (WSJ). DATE?
  25. Some Cub Reporter. Websites vary prices, deals based on users’  information. The Wall Street Journal (WSJ). DATE?
  26. A. Korolova. Privacy Violations Using Microtargeted Ads: A Case Study. In Proceedings of the International Conference on Data Mining Workshops (ICDMW). 2010. pages 474–482.
  27. B. Krishnamurthy, C. E. Wills. On The Leakage Of Personally Identifiable Information Via Online Social Networks. In Proceedings of the 2Nd ACM Workshop on Online Social Networks (WOSN) 2009. pages 7–12.
  28. J. Lanier. Who Owns the A Future? Simon, Schuster, 2013.
  29. Lightbeam for Firefox. Mozilla.
  30. B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, R. Govindan. AdReveal: Improving Transparency into Online Targeted Advertising. In Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks (HotNets). 2013-11.
  31. Csomething LLP. Declaration of Kyle C. Wong in Support of Google Inc.’s Opposition to Plaintiffs’ Motion for Class Certification.
  32. J. Mikians, L. Gyarmati, V. Erramilli, N. Laoutaris. Detecting Price, Search Discrimination on the Internet. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets). pages 79–84.
  33. L. Olejnik, T. Minh-Dung, C. Castelluccia, et al. Selling Off privacy at Auction. In Proceedings of the Network, Distributed System Security Symposium (NDSS), 2013.
  34. R. A. Popa, C. M. S. Redfield, N. Zeldovich, H. Balakrishnan. CryptDB: Protecting Confidentiality with Encrypted Query Processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP). pages 85–100.
  35. F. Roesner. Sharemenot.
  36. F. Roesner, T. Kohno, D. Wetherall. Detecting, Defending Against Third-Party Tracking on the Web. In Proceedings of the 9th USENIX Conference on Networked Systems Design, Implementation (NSDI). USENIX Association. 2012-04.
  37. A. Sadilek, H. Kautz. Modeling the Impact of Lifestyle on Health at Scale. In Proceedings of the Sixth ACM International Conference on Web Search, Data Mining (WSDM). 2013-02.
  38. Snapchat
  39. Snapchat. How Snaps Are Stored, Deleted.
  40. L. Sweeney. Discrimination in Online Ad Delivery. In Communications of the ACM, 56(5). 2013-04.
  41. V. Toubiana, A. Narayanan, D. Boneh. Adnostic: Privacy Preserving Targeted Advertising. In Proceedings of the Networking and Distributed Systems Symposium (NDSS), 2010.
  42. X. Wang, M. Gerber, D. Brown. Automatic Crime Prediction Using Events Extracted From Twitter Posts. In S. Yang, A. Greenberg, M. Endsley, editors, Social Computing, Behavioral-Cultural Modeling, Prediction, Volume 7227 of Lecture Notes in Computer Science, pages 231–238. Springer Berlin Heidelberg, 2012.
  43. A. Whitten, J. D. Tygar. Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0. In Proceedings of USENIX Security, 1999.
  44. C. E. Wills, C. Tatar. Understanding What They Do with What They Know. In Proceedings of the 12th Annual  ACM Workshop on Privacy in the Electronic Society (WPES).
  45. ?. Xing, W. Meng, D. Doozan, N. Feamster, W. Lee, A. C. Snoeren. Exposing Inconsistent Web Search Results with Bobble. In Proceedings of the Passive, Active Measurements Conference, 2014.
  46. Y. Zhu, J. Jung, D. Song, T. Kohno, D. Wetherall. Privacy Scope: A Precise Information Flow Tracking System for Finding Application Leaks. Technical Report UCB/EECS-2009-145, EECS Department, University of California, Berkeley, 2009-10.
  47. P. R. Zimmermann. The Official PGP User’s Guide. 1995.

Via: backfill

The Web never forgets: Persistent tracking mechanisms in the wild | Acar, Eubank, Englehardt, Juarez, Narayanan, Diaz

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz; The Web never forgets: Persistent tracking mechanisms in the wild; In Proceedings of the Conference on Computer & Communication Security (CCS); 2014-11, draft of 2014-07-24; 16 pages; landing including some data in tabular format.

Abstract

We present the first large-scale studies of three advanced web tracking mechanisms — canvas fingerprinting, evercookies and use of “cookie syncing” in conjunction with evercookies. Canvas fingerprinting, a recently developed form of browser fingerprinting, has not previously been reported in the wild; our results show that over 5% of the top 100,000 websites employ it. We then present the first automated study of evercookies and respawning and the discovery of a new evercookie vector, IndexedDB. Turning to cookie syncing, we present novel techniques for detection and analysing ID flows and we quantify the amplification of privacy-intrusive track- ing practices due to cookie syncing.

Our evaluation of the defensive techniques used by privacy-aware users finds that there exist subtle pitfalls — such as failing to clear state on multiple browsers at once — in which a single lapse in judgement can shatter privacy defenses. This suggests that even sophisticated users face great difficulties in evading tracking techniques.

References

  • Bug 757726 – disallow enumeration of navigator.plugins. Mozilla. 2012-05.
  • Manage, disable Local Shared Objects | Flash Player. Adobe. 2014.
  • Doubleclick ad exchange real-time bidding protocol: Cookie matching. Google. 2014-02.
  • Selenium – Web Browser Automation. 2014.
  • G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, B. Preneel. FPDetective: Dusting the Web for Fingerprinters. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pages 1129–1140. ACM, 2013.
  • M. Ayenson, D. J. Wambach, A. Soltani, N. Good, C. J. Hoofnagle. Flash cookies and Privacy II: Now with HTML5 and ETag Respawning. World Wide Web Internet And Web Information Systems, 2011.
  • M. Backes, A. Kate, M. Maffei, K. Pecina. Obliviad: Provably Secure and Practical Online Behavioral Advertising. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (SP), pages 257–271. IEEE, 2012.
  • R. Balebako, P. Leon, R. Shay, B. Ur, Y. Wang, L. Cranor. Measuring the Effectiveness of Privacy Tools for Limiting Behavioral Advertising. In Proceedings of the Web 2.0 Workshop on Security and Privacy, 2012.
  • F. Besson, N. Bielova, T. Jensen. Enforcing Browser Anonymity with Quantitative Information Flow. INRIA; 2014-05; 13 pages. landing
  • M. Bilenko, M. Richardson, J. Y. Tsai. Targeted, Not Tracked: Client-Side Solutions For Privacy-Friendly Behavioral Advertising. In Proceedings of the 11th Privacy Enhancing Technologies Symposium (PETS 2011), 2011.
  • P. E. Black. Ratcliff/Obershelp pattern recognition. 2004-12.
  • K. Brade. gitweb.torproject.org – torbrowser.git/blob – src/current-patches/firefox/0019-add-canvas-image-extraction-prompt.patch. 2012-11.
  • A. Das, N. Borisov, and M. Caesar. Fingerprinting Smart Devices Through Embedded Acoustic Components. arXiv preprint arXiv:1403.3366. 2014-03-13.
  • W. Davis. KISSmetrics Finalizes Supercookies Settlement. 2013.
  • P. Eckersley. How unique is your web browser? In Proceedings of Privacy Enhancing Technologies (PETS), pages 1–18. Springer, 2010.
  • C. Eubank, M. Melara, D. Perez-Botero, A. Narayanan. Shining The Floodlights on Mobile Web Tracking – A Privacy Survey. In Proceedings of the Web 2.0 Conference on Security and Privacy, 2013-05.
  • E. W. Felten. If You’re Going to Track Me, Please Use Cookies. In Freedom to Tinker. 2009.
  • M. Fredrikson, B. Livshits. Repriv: Re-imagining Content Personalization and In-browser Privacy. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP), pages 131–146. IEEE, 2011.
  • S. Guha, B. Cheng, P. Francis. Privad: Practical Privacy In Online Advertising. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. pages 169–182. USENIX Association, 2011.
  • S. Kamkar. Evercookie – virtually irrevocable persistent cookies. In His Blog. 2010-09.
  • M. Kerrisk. strace(1) – linux manual page, 2014-05.
  • T. Kohno, A. Broido, and K. C. Claffy. Remote Physical Device Fingerprinting. In Transactions on Dependable and Secure Computing. IEEE. 2(2):93–108, 2005.
  • B. Krishnamurthy, C. Wills. Privacy Diffusion on the Web: A Longitudinal Perspective. In Proceedings of the 18th International Conference on World Wide Web. pages 541–550. ACM, 2009.
  • B. Krishnamurthy, C. E. Wills. On the Leakage Of Personally Identifiable Information Via Online Social Networks. In Proceedings of the 2nd ACM Workshop on Online Social Networks. pages 7–12. ACM, 2009.
  • B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, R. Govindan. AdReveal: Improving Transparency Into Online Targeted Advertising. In Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks. page 12. ACM, 2013.
  • J. Mayer. Tracking the Trackers: Self-help Tools. 2011-09.
  • J. R. Mayer, J. C. Mitchell. Third-party web tracking: Policy and technology. In IEEE Symposium on Security and Privacy, pages 413–427. IEEE, 2012.
  • A. M. McDonald, L. F. Cranor. Survey of the Use of Adobe Flash Local Shared Objects to Respawn HTTP Cookies. In Proceedings of ISJLP, 7:639, 2011. landing.
  • K. Mowery, D. Bogenreif, S. Yilek, H. Shacham. Fingerprinting Information in JavaScript Implementations. In Proceedings of W2SP, Volume 2. 2011
  • K. Mowery, H. Shacham. Pixel Perfect: Fingerprinting Canvas in HTML5. In Proceedings of W2SP, 2012.
  • M. Mulazzani, P. Reschl, M. Huber, M. Leithner, S. Schrittwieser, E. Weippl, F. C. Wien. Fast and Reliable Browser Identification With JavaScript Engine Fingerprinting. In Proceedings of the Web 2.0 Workshop on Security and Privacy (W2SP), Volume 1, 2013.
  • A. Narayanan, J. Mayer, S. Iyengar. Tracking Not Required: Behavioral Targeting, 2012.
  • N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. Van Acker, W. Joosen, C. Kruegel, F. Piessens, G. Vigna. You Are What You Include: Large-scale Evaluation Of Remote JavaScript Inclusions. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, pages 736–747. ACM, 2012.
  • N. Nikiforakis, W. Joosen, B. Livshits. PriVaricator: Deceiving Fingerprinters with Little White Lies. Microsoft. 2014-02. landing.
  • N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, G. Vigna. Cookieless Monster: Exploring the Ecosystem of Web-based Device Fingerprinting. In Proceedings of the IEEE Symposium on Security and Privacy, pages 541–555. IEEE, 2013.
  • L. Olejnik, T. Minh-Dung, C. Castelluccia. Selling Off Privacy at Auction. In Proceedings of the 20th Annual Network and Distributed System Security Symposium (NDSS), 2013.
  • C. R. Orr, A. Chauhan, M. Gupta, C. J. Frisz, C. W. Dunn. An Approach for Identifying JavaScript-loaded Advertisements Through Static Program Analysis. In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society. pages 1–12. ACM, 2012.
  • M. Perry, E. Clark, S. Murdoch. The design and implementation of the Tor browser [draft]. 2013-03.
  • F. Roesner, T. Kohno, D. Wetherall. Detecting and Defending Against Third-Party Tracking on the Web. In Proceedings of the Symposium on Networking Systems Design and Implementation. USENIX, 2012.
  • A. Soltani, S. Canty, Q. Mayo, L. Thomas, C. J. Hoofnagle. Flash Cookies and Privacy. In Proceedings of the AAAI Spring Symposium: Intelligent Information Privacy Management, 2010.
  • O. Sorensen. Zombie-cookies: Case studies and Mitigation. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST), pages 321–326. IEEE, 2013.
  • A. Taly, J. C. Mitchell, M. S. Miller, J. Nagra, et al. Automated Analysis of Security-Critical JavaScript APIs. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP). pages 363–378. IEEE, 2011.
  • J. Temple. Stale Cookies: How Companies Are Tracking You Online Today. In SFGate, a newspaper. 2013
  • M. Tran, X. Dong, Z. Liang, X. Jiang. Tracking the trackers: Fast and scalable dynamic analysis of web content for privacy violations. In Applied Cryptography and Network Security, pages 418–435. Springer, 2012.
  • M.-D. Tran, G. Acs, C. Castelluccia. Retargeting Without Tracking. arXiv preprint arXiv:1404.4533. 2014-04-17;
  • T. Unger, M. Mulazzani, D. Fruhwirt, M. Huber, S. Schrittwieser, E. Weippl. SHPF: Enhancing HTTP(S) Session Security with Browser Fingerprinting. In Proceedings of the 2013 Eighth International Conference on Availability, Reliability and Security (ARES), pages 255–261. IEEE, 2013.
  • V. Vasilyev. Valve/fingerprintjs. 2012.

Via: backfill

On de-identification, a debate

In order of appearance

The Claim: Cavoukian & Castro

tl;dr => de-anonynmization “plays a role”, the risks are small (enough), there are ameliorizations & remediations for every concern.

References

The Rebuttal: Narayanan & Felten

tl;dr => it doesn’t work, the reasoning is flawed; you can de-anonymize, we already showed that.  “… is not an option” as the hard line.

Referenced

Via the commentariat

References

  • Paul Ohm, Broken promises of privacy: Responding to the surprising failure of anonymization, UCLA L. Rev., 57, 1701 (2009).
  • Ann Cavoukian, Daniel Castro; Big Data and Innovation, Setting the Record Straight: Deidentification Does Work; 2014.
  • Yves-Alexandre de Montjoye et al.; Unique in the Crowd: The privacy bounds of human mobility; Scientific Reports; Volume 3; 2013.
  • Hui Zang, Jean Bolot; Anonymization of location data does not work: A large-scale measurement study; In Proceedings of the 17th Intl. Conf. on Mobile Computing and Networking; 2011; pages 145-156.
  • Philippe Golle, Kurt Partridge; On the anonymity of home/work location pairs; In Pervasive Computing; 2009; pages 390-397.
  • Arvind Narayanan, Vitaly Shmatikov; Robust de-anonymization of large sparse datasets, in Proceedings of the 2008 IEEE Symposium on Security and Privacy; 2008; pages 111-125.
  • Johan Ugander, Brian Karrer, Lars Backstrom, Cameron Marlow; The anatomy of the Facebook social graph; In arXiv; 2011; arXiv:1111.4503.
  • Vijay Pandurangan; On Taxis and Rainbows: Lessons from NYC’s improperly anonymized taxi logs; 2014.
  • Edward Felten (FTC); Does Hashing Make Data “Anonymous”?; In Tech @FTC; 2012.
  • Arvind Narayanan; Lendingclub.com: A De-anonymization Walkthrough; In His Blog entitled 33 Bits of Entropy; 2008.
  • Khaled El Emam et al., De-identification methods for open health data: the case of the Heritage Health Prize claims dataset, Journal of Medical Internet Research; Volume 14, Number 1; 2012.
  • Arvind Narayanan, An Adversarial Analysis of the Reidentifiability of the Heritage Health Prize Dataset; (unpublished?) manuscript; 2011.

The Response: El Eman & Arbuckle

De-Identification: A Critical Debate; Khaled El Emam, Luk Arbuckle; In Future of Privacy Forum; 2014-07-24

  • Khaled El Emam, University of Ottawa, CHEO Research Institute & Privacy Analytics Inc.
  • Luk Arbuckle, CHEO Research Institute, Privacy Analytics Inc.

tl;dr => it does too work, if you tolerate some re-identification around the edges.

Referenced

References

  • K. El Emam, E. Jonker, L. Arbuckle, and B. Malin, “A Systematic Review of Re-Identification Attacks on Health Data,” PLoS ONE, vol. 6, no. 12, p. e28071, Dec. 2011.
  • Anna Monreale, Gennady L. Andrienko, Natalia V. Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, and Stefan Wrobel, “Movement Data Anonymity through Generalization,” Transactions on Data Privacy, vol. 3, no. 2, pp. 91–121, 2010.
  • S. C. Wieland, C. A. Cassa, K. D. Mandl, and B. Berger, “Revealing the spatial distribution of a disease while preserving privacy,” Proc. Natl. Acad. Sci. U.S.A., vol. 105, no. 46, pp. 17608–17613, Nov. 2008.
  • K. El Emam and L. Arbuckle, Anonymizing Health Data: Case Studies and Methods to Get You Started. O’Reilly, 2013.
  • L. Willenborg and T. de Waal, Statistical Disclosure Control in Practice. New York: Springer-Verlag, 1996.
  • L. Willenborg and T. de Waal, Elements of Statistical Disclosure Control. New York: Springer-Verlag, 2001.
  • K. El Emam, Guide to the De-Identification of Personal Health Information. CRC Press (Auerbach), 2013.
  • K. El Emam, L. Arbuckle, G. Koru, B. Eze, L. Gaudette, E. Neri, S. Rose, J. Howard, and J. Gluck, “De-identification Methods for Open Health Data: The Case of the Heritage Health Prize Claims Dataset,” Journal of Medical Internet Research, vol. 14, no. 1, p. e33, Feb. 2012.
  • K. El Emam, F. Dankar, R. Issa, E. Jonker, D. Amyot, E. Cogo, J.-P. Corriveau, M. Walker, S. Chowdhury, R. Vaillancourt, T. Roffey, and J. Bottomley, “A Globally Optimal k-Anonymity Method for the De-identification of Health Data,” Journal of the American Medical Informatics Association, vol. 16, no. 5, pp. 670–682, 2009.
  • K. El Emam, E. Jonker, E. Moher, and L. Arbuckle, “A Review of Evidence on Consent Bias in Research,” American Journal of Bioethics, vol. 13, no. 4, pp. 42–44, 2013.

The Economics of Advertising and Privacy | Catherine Tucker

Catherine Tucker (MIT); The Economics of Advertising and Privacy; In Something (draft preprint submitted to Elsevier), 2011-11-19, 10 pages.

Abstract

One of the new realities of advertising is that personal information can be used to ensure that advertising is only shown and designed for a select group of consumers who stand to gain most from this information. However, to gather the data used for targeting requires some degree of privacy intrusion by advertisers. This sets up a tradeoff between the informativeness of advertising and the degree of privacy intrusion. This paper summarizes recent empirical research that illuminates this tradeoff.

Outline

  1. Introduction
  2. Does consumers’ distaste for ‘intrusiveness’ matter empirically?
  3. How can and do firms respond?
  4. Future Directions
  5. References

Mentioned

  • Harms are asserted
    1. price discrimination [Acquisti, Varian 2005][Fudenburg, Villas-Boas, 2006]
    2. annoyance, especially from retargeting [Turow 2009]
    3. privacy regulation entrenches monopolies [Campbell, Goldfarb, Tucker 2009][Evans 2009]
  • <quote>For example, in the diapers example it is unlikely that there would be a direct effect on the price paid by the mother as a result of the exposure to the ad, since diapers are largely bought offline and manufacturers are not easily able to match a cookie on a computer to a real-life offline customer.</quote>
  • “Results” marshalled as “evidence”
    • theoretical results via reasoning (clausal, syllogistic, etc.)
    • theoretical results via simulation
    • experiments in lab settings
    • interviews, self-attestation
    • something about a large database of actual campaigns <quote>data from a randomized field experiment conducted by a US-based non-profit organization (NPO) to optimize its advertising campaigns on Facebook. <snip/> to raise awareness of its work improving education for women in East Africa</quote>
  • Industry responses
    • more accurate retargeting
    • content targeting contra (personalized) retargeting
    • more obtrusiveness; e.g. self-start video, audio, floaters, roadblocks, interstitials
  • Claims
    • retargeting campaigns are less effective than unpersonalized campaigning (because of reactance, etc.)
    • no tradeoff between personalization and ad effectiveness; softened: not necessarily a tradeoff between … citing EU cookie & privacy laws as paradigm.
    • European cookie & data laws imply 65% reduction in the influence banner ads have on purchase intent relative to non-EU web sites promoting the same [Goldfarb, Tucker, 2011]

Prescription

  • more research
  • stop retargeting;
    • else stop retargeting but retain the benefits of retargeting
    • else mitigate reactance (by various techniques)
    • else give the appearance of control over personal data
  • regulation, government regulation; data use restriction
  • self-regulation via privacy policy specifications
  • not only online, but QR codes in print, and addressable TV & Radio (e.g. Sirius, TiVo, Generic DVR) are in scope of regulation too

Argot

  • Coasian solution of Posner
  • disutility [expressed as ...]
  • information theory
    • asymmetric information
  • reactance
  • signalling [theory]
  • social advertising

Definition

  • Coasian solution of Posner => intellectual property law; vest “ownership rights” in data about that owner.
  • Reactance => <quote>‘Reactance’ describes a process where consumers resist something they find coercive by behaving in the opposite way to the one intended, which is in this case not finding the ad appealing </quote> . [White, Zahay, Thorbjornsen, Shavitt 2008][Boehm 1966][Clee, Wicklund 1980][Brehm 1989][Lohr 2010].

References

  • A. Acquisti, S. Spiekermann (2011, May). Do Interruptions Pay off? Effects of Interruptive Ads on Consumers’ Willingness to Pay. Journal of Interactive Marketing.
  • A. Acquisti, H. R. Varian (Summer 2005). Conditioning prices on purchase history. Marketing Science 24 (3), 367–381.
  • B. Anand, R. Shachar (2009, September). Targeted advertising as a signal. Quantitative Marketing and Economics 7 (3), 237–266.
  • A. R. Beresford, D. Kuebler, S. Preibusch (2010, June). Unwillingness to pay for privacy: A field experiment. IZA Discussion Papers 5017, Institute for the Study of Labor (IZA).
  • J. W. Brehm (1966). A theory of psychological reactance. Academic Press, New York.
  • J. W. Brehm (1989). Psychological reactance: Theory and applications. Advances in Consumer Research 16, 72–75. eds. Thomas K. Srull, Provo, UT.
  • G. R. Butters (1977). Equilibrium distributions of sales and advertising prices. The Review of Economic Studies 44 (3), 465–491.
  • J. D. Campbell, A. Goldfarb, C. Tucker (2010). Privacy Regulation and Market Structure. SSRN eLibrary.
  • E. Chamberlin  (1933). The Theory of Monopolistic Competition. Harvard University Press, Cambridge, MA.
  • M. A. Clee, R. A. Wicklund (1980). Consumer behavior and psychological reactance. The Journal of Consumer Research 6 (4), pp. 389–405.
  • M. Culnan, P. Armstrong (1999, Jan-Feb). Information privacy concerns, procedural fairness, and interpersonal trust: An empirical investigation. Organization Science 10 (1), 104–115.
  • D. S. Evans (2009). The online advertising industry: Economics, evolution, and privacy. The Journal of Economic Perspectives 23 (3), 37–60.
  • D. Fudenburg, J. M. Villas-Boas (2006). Volume 1: Handbooks in Information Systems, Chapter 7: Behavior Based Price Discrimination and Customer Recognition, pp. 377–435. Emerald Group Publishing.
  • M. Fusilier, W. Hoyer (1980). Variables affecting perceptions of invasion of privacy in a personnel selection situation. Journal of Applied Psychology 65 (5), 623–626.
  • A. Goldfarb, C. Tucker (2011a, May). Online display advertising: Targeting and obtrusiveness.
  • A. Goldfarb, C. Tucker (2011b). Search engine advertising: Channel substitution when pricing ads to context. Management Science 57 (3), 458–470.
  • A. Goldfarb, C. E. Tucker (2011c). Privacy regulation and online advertising. Management Science 57 (1), 57–71.
  • B. Hermalin, M. Katz (2006, September). Privacy, property rights and efficiency: The economics of privacy as secrecy. Quantitative Marketing and Economics 4 (3), 209–239.
  • K. Hui, I. Png (2006). Economics and Information Systems, Handbooks in Information Systems, vol. 1, Chapter 9: The Economics of Privacy. Elsevier.
  • J. Johnson (2009). Targeted advertising and advertising avoidance. Mimeo, Cornell
  • R. E. Kihlstrom, M. H. Riordan (1984, June). Advertising as a signal. Journal of Political Economy 92 (3), 427–50.
  • A. Lambrecht, C. Tucker (2011). When does retargeting work? Timing information specificity. MSI Working Paper 11-105 .
  • S. Lohr (2010, April 30). Privacy concerns limit online ads, study says. New York Times.
  • N. K. Malhotra, S. S. Kim, J. Agarwal (2004). Internet users’ information privacy concerns (IUIPC): The construct, the scale, and a causal model. Information Systems Research 15 (4), 336–355.
  • R. A. Posner (1980). The economics of privacy. Technical report.
  • A. Sherman (2011, October 31). Cable TV tries to catch up with interactive ads. San Francisco Chronicle.
  • S. E. Taylor (1979). Hospital patient behavior: Reactance, helplessness, or control?
    Journal of Social Issues 35 (1), 156–184.
  • C. Tucker (2011a). Social Advertising. Mimeo, MIT .
  • C. Tucker (2011b). Social Networks, Personalized Advertising, and Privacy Controls. Mimeo, MIT .
  • J. Turow, J. King, C. J. Hoofnagle, A. Bleakley, M. Hennessy (2009). Americans Reject Tailored Advertising and Three Activities that Enable It. Mimeo, Berkeley.
  • T. White, D. Zahay, H. Thorbjornsen, S. Shavitt (2008, March). Getting too personal: Reactance to highly personalized email solicitations. Marketing Letters 19 (1), 39–50.

Via: backfill