Syllabus for Solon Barocas @ Cornell | INFO 4270: Ethics and Policy in Data Science

INFO 4270 – Ethics and Policy in Data Science
Instructor: Solon Barocas
Venue: Cornell University

Syllabus

Solon Barocas

Readings

A Canon, The Canon

In order of appearance in the syllabus, without the course cadence markers…

  • Danah Boyd and Kate Crawford, Critical Questions for Big Data; In <paywalled>Information, Communication & Society,Volume 15, Issue 5 (A decade in Internet time: the dynamics of the Internet and society); 2012; DOI:10.1080/1369118X.2012.678878</paywalled>
    Subtitle: Provocations for a cultural, technological, and scholarly phenomenon
  • Tal Zarsky, The Trouble with Algorithmic Decisions; In Science, Technology & Human Values, Vol 41, Issue 1, 2016 (2015-10-14); ResearchGate.
    Subtitle: An Analytic Road Map to Examine Efficiency and Fairness in Automated and Opaque Decision Making
  • Cathy O’Neil, Weapons of Math Destruction; Broadway Books; 2016-09-06; 290 pages, ASIN:B019B6VCLO: Kindle: $12, paper: 10+SHT.
  • Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information; Harvard University Press; 2016-08-29; 320 pages; ASIN:0674970845: Kindle: $10, paper: $13+SHT.
  • Executive Office of the President, President Barack Obama, Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights; The White House Office of Science and Technology Policy (OSTP); 2016-05; 29 pages; archives.
  • Lisa Gitelman (editor), “Raw Data” is an Oxymoron; Series: Infrastructures; The MIT Press; 2013-01-25; 192 pages; ASIN:B00HCW7H0A: Kindle: $20, paper: $18+SHT.
    Lisa Gitelman, Virginia Jackson; Introduction (6 pages)
  • Agre, “Surveillance and Capture: Two Models of Privacy”
  • Bowker and Star, Sorting Things Out
  • Auerbach, “The Stupidity of Computers”
  • Moor, “What is Computer Ethics?”
  • Hand, “Deconstructing Statistical Questions”
  • O’Neil, On Being a Data Skeptic
  • Domingos, “A Few Useful Things to Know About Machine Learning”
  • Luca, Kleinberg, and Mullainathan, “Algorithms Need Managers, Too”
  • Friedman and Nissenbaum, “Bias in Computer Systems”
  • Lerman, “Big Data and Its Exclusions”
  • Hand, “Classifier Technology and the Illusion of Progress” [Sections 3 and 4]
  • Pager and Shepherd, “The Sociology of Discrimination: Racial Discrimination in Employment, Housing, Credit, and Consumer Markets”
  • Goodman, “Economic Models of (Algorithmic) Discrimination”
  • Hardt, “How Big Data Is Unfair”
  • Barocas and Selbst, “Big Data’s Disparate Impact” [Parts I and II]
  • Gandy, “It’s Discrimination, Stupid”
  • Dwork and Mulligan, “It’s Not Privacy, and It’s Not Fair”
  • Sandvig, Hamilton, Karahalios, and Langbort, “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms”
  • Diakopoulos, “Algorithmic Accountability: Journalistic Investigation of Computational Power Structures”
  • Lavergne and Mullainathan, “Are Emily and Greg more Employable than Lakisha and Jamal?”
  • Sweeney, “Discrimination in Online Ad Delivery”
  • Datta, Tschantz, and Datta, “Automated Experiments on Ad Privacy Settings”
  • Dwork, Hardt, Pitassi, Reingold, and Zemel, “Fairness Through Awareness”
  • Feldman, Friedler, Moeller, Scheidegger, and Venkatasubramanian, “Certifying and Removing Disparate Impact”
  • Žliobaitė and Custers, “Using Sensitive Personal Data May Be Necessary for Avoiding Discrimination in Data-Driven Decision Models”
  • Angwin, Larson, Mattu, and Kirchner, “Machine Bias”
  • Kleinberg, Mullainathan, and Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores”
  • Northpointe, COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity
  • Chouldechova, “Fair Prediction with Disparate Impact”
  • Berk, Heidari, Jabbari, Kearns, and Roth, “Fairness in Criminal Justice Risk Assessments: The State of the Art”
  • Hardt, Price, and Srebro, “Equality of Opportunity in Supervised Learning”
  • Wattenberg, Viégas, and Hardt, “Attacking Discrimination with Smarter Machine Learning”
  • Friedler, Scheidegger, and Venkatasubramanian, “On the (Im)possibility of Fairness”
  • Tene and Polonetsky, “Taming the Golem: Challenges of Ethical Algorithmic Decision Making”
  • Lum and Isaac, “To Predict and Serve?”
  • Joseph, Kearns, Morgenstern, and Roth, “Fairness in Learning: Classic and Contextual Bandits”
  • Barocas, “Data Mining and the Discourse on Discrimination”
  • Grgić-Hlača, Zafar, Gummadi, and Weller, “The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making”
  • Vedder, “KDD: The Challenge to Individualism”
  • Lippert-Rasmussen, “‘We Are All Different’: Statistical Discrimination and the Right to Be Treated as an Individual”
  • Schauer, Profiles, Probabilities, And Stereotypes
  • Caliskan, Bryson, and Narayanan, “Semantics Derived Automatically from Language Corpora Contain Human-like Biases”
  • Zhao, Wang, Yatskar, Ordonez, and Chang, “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints”
  • Bolukbasi, Chang, Zou, Saligrama, and Kalai, “Man Is to Computer Programmer as Woman Is to Homemaker?”
  • Citron and Pasquale, “The Scored Society: Due Process for Automated Predictions”
  • Ananny and Crawford, “Seeing without Knowing”
  • de Vries, “Privacy, Due Process and the Computational Turn”
  • Zarsky, “Transparent Predictions”
  • Crawford and Schultz, “Big Data and Due Process”
  • Kroll, Huey, Barocas, Felten, Reidenberg, Robinson, and Yu, “Accountable Algorithms”
  • Bornstein, “Is Artificial Intelligence Permanently Inscrutable?”
  • Burrell, “How the Machine ‘Thinks’”
  • Lipton, “The Mythos of Model Interpretability”
  • Doshi-Velez and Kim, “Towards a Rigorous Science of Interpretable Machine Learning”
  • Hall, Phan, and Ambati, “Ideas on Interpreting Machine Learning”
  • Grimmelmann and Westreich, “Incomprehensible Discrimination”
  • Selbst and Barocas, “Regulating Inscrutable Systems”
  • Jones, “The Right to a Human in the Loop”
  • Edwards and Veale, “Slave to the Algorithm? Why a ‘Right to Explanation’ is Probably Not the Remedy You are Looking for”
  • Duhigg, “How Companies Learn Your Secrets”
  • Kosinski, Stillwell, and Graepel, “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior”
  • Barocas and Nissenbaum, “Big Data’s End Run around Procedural Privacy Protections”
  • Chen, Fraiberger, Moakler, and Provost, “Enhancing Transparency and Control when Drawing Data-Driven Inferences about Individuals”
  • Robinson and Yu, Knowing the Score
  • Hurley and Adebayo, “Credit Scoring in the Era of Big Data”
  • Valentino-Devries, Singer-Vine, and Soltani, “Websites Vary Prices, Deals Based on Users’ Information”
  • The Council of Economic Advisers, Big Data and Differential Pricing
  • Hannak, Soeller, Lazer, Mislove, and Wilson, “Measuring Price Discrimination and Steering on E-commerce Web Sites”
  • Kochelek, “Data Mining and Antitrust”
  • Helveston, “Consumer Protection in the Age of Big Data”
  • Kolata, “New Gene Tests Pose a Threat to Insurers”
  • Swedloff, “Risk Classification’s Big Data (R)evolution”
  • Cooper, “Separation, Pooling, and Big Data”
  • Simon, “The Ideological Effects of Actuarial Practices”
  • Tufekci, “Engineering the Public”
  • Calo, “Digital Market Manipulation”
  • Kaptein and Eckles, “Selecting Effective Means to Any End”
  • Pariser, “Beware Online ‘Filter Bubbles’”
  • Gillespie, “The Relevance of Algorithms”
  • Buolamwini, “Algorithms Aren’t Racist. Your Skin Is just too Dark”
  • Hassein, “Against Black Inclusion in Facial Recognition”
  • Agüera y Arcas, Mitchell, and Todorov, “Physiognomy’s New Clothes”
  • Garvie, Bedoya, and Frankle, The Perpetual Line-Up
  • Wu and Zhang, “Automated Inference on Criminality using Face Images”
  • Haggerty, “Methodology as a Knife Fight”
    <snide>A metaphorical usage. Let hyperbole be your guide</snide>

Previously filled.

Big Data and Privacy: A Technological Perspective | PCAST

Big Data and Privacy: A Technological Perspective; Executive Office of the President, President’s Council of Advisors on Science and Technology (PCAST); 2014-05-01; 76 pages; landing.

Related

Workshops

  • White House / UC Berkeley School of Information / Berkeley Center for Law and Technology; John Podesta; 2014-04-01; transcript, video.
  • White House / Data & Society Research Institute / NYU Information Law Institute; John Podesta; 2014-03-17; video.
  • White House / MIT; John Podesta; 2014-03-04; transcript, video.

Who

PCAST Big Data and Privacy Working Group.
  • Susan L. Graham, co-chair.
  • William Press, co-chair.
  • S. James Gates, Jr.,
  • Mark Gorenberg,
  • John Holdren,
  • Eric S. Lander,
  • Craig Mundie,
  • Maxine Savitz,
  • Eric Schmidt.
  • Marjory S. Blumenthal, Executive Director of PCAST; coordination & framing..

PCAST

  • John P Holdren, co-chair, OSTP
  • Eric S. Lander, co-chair, Broad Institute (Harvard&MIT)
  • William Press, co- vice chair, U. Texas
  • Maxine Savitz, co- vice chair, National Academy of Engineering
  • Rosina Bierbaum, U. Michigan
  • Christine Cassel, National Quality Forum
  • Christopher Chyba, Princeton
  • S. James Gates, Jr., U. Maryland
  • Gorenberg, Zetta Venture Partners
  • Susan L. Graham, UCB
  • Shirley Ann Jackson, Rensselaer Polytechnic
  • Richard C. Levin, Yale
  • Chad Mirkin, Northwestern
  • Mario Molina, UCSD
  • Craig Mundie, Microsoft
  • Ed Penhoet, UCB
  • Barbara Schaal, Washington University
  • Eric Schmidt, Google
  • Daniel Schrag, Harvard

Staff

  • Marjory S. Blumenthal
  • Michael Johnson

Recommendations

From the Executive Summary [page xiii], and also from Section 5.2 [page 49]

  • Recommendation 1 [consider uses over collections activites]
    Policy attention should focus more on the actual uses of big data and less on its collection and analysis.
  • Recommendation 2 [no Microsoft lockin; no national champion]
    Policies and regulation, at all levels of government, should not embed particular technological solutions, but rather should be stated in terms of intended outcomes.
  • Recommendation 3 [fund]
    With coordination and encouragement from [The White House Office of Science and Technology Policy] OSTP, the [Networking and Information Technology Research and Development] NITRD agencies should strengthen U.S. research in privacy‐related technologies and in the relevant areas of social science that inform the successful application of those technologies.
  • Recommendation 4 [talk]
    OSTP, together with the appropriate educational institutions and professional societies, should encourage increased education and training opportunities concerning privacy protection, including career paths for professionals.
  • Recommendation 5 [talk & buy]
    The United States should take the lead both in the international arena and at home by adopting policies that stimulate the use of practical privacy‐protecting technologies that exist today. It can exhibit leadership both by its convening power (for instance, by promoting the creation and adoption of standards) and also by its own procurement practices (such as its own use of privacy‐preserving cloud services)

Table of Contents

  1. Executive Summary
  2. Introduction
    1. Context and outline of this report
    2. Technology has long driven the meaning of privacy
    3. What is different today?
    4. Values, harms, and rights
  3. Examples and Scenarios
    1. Things happening today or very soon
    2. Scenarios of the near future in healthcare and education
    3. Healthcare: personalized medicine,
    4. Healthcare: detection of symptoms by mobile devices
    5. Education
    6. Challenges to the home’s special status
    7. Tradeoffs among privacy, security, and convenience
  4. Collection, Analytics, and Supporting Infrastructure
    1. Electronic sources of personal data
      1. “Born digital” data
      2. Data from sensors
    1. Big data analytics
      1. Data mining
      2. Data fusion and information integration
      3. Image and speech recognition
      4. Social‐network analysis
    2. The infrastructure behind big data
      1. Data centers
      2. The cloud
  5. Technologies and Strategies for Privacy Protection
    1. The relationship between cybersecurity and privacy
    2. Cryptography and encryption
      1. Well Established encryption technology
      2. Encryption frontiers
    3. Notice and consent
      1. Other strategies and techniques
        1. Anonymization or de‐identification
        2. Deletion and non‐retention
    4. Robust technologies going forward
      1. A Successor to Notice and Consent
      2. Context and Use
      3. Enforcement and deterrence
      4. Operationalizing the Consumer Privacy Bill of Rights
  6. PCAST Perspectives and Conclusions
    1. Technical feasibility of policy interventions
    2. Recommendations
    3. Final Remarks
  7. Appendix A. Additional Experts Providing Input
  8. Special Acknowledgment

Mentions

  • The President’s Council of Advisors on Science and Technology (PCAST)
  • PCAST Big Data and Privacy Working Group
  • Enabling Event
    • President Barack Obama
    • Remarks, 2014-01-17
    • Counselor John Podesta
  • New Concerns
    • Born digital vs born analog
    • standardized components
    • particular limited purpose vs repurposed, reused.
    • data fusion
    • algorithms
    • inferences
  • Provenance of data, recording and tracing the provenance of data
  • Trusted Data Format (TDF)

Claims

  • Right to forget, right to be forgotten is unenforceable infeasible [page 48].
  • Prior redress of prospective harms is a reasonable framework [page 49]
    • Conceptualized as vulnerable groups who are stipulated as harmed a priori or are harmed sunt constitua.
  • Government may be forbidden from certain classes of uses, despite their being available in the private
    sector

    • Government is allowed some activities and powers
    • Private industry is allowed some activities and powers
    • It is feasible in practice to mix & match
      • government coercion => private privilege => result
      • private privilege => private coercion => result

Consumer Privacy Bill of Rights (CPBR)

Obligations [of service providers, as powerful organizations]

  • Respect for Context => use consistent with collection context.
  • Focused Collection => limited collection.
  • Security => handling techniques
  • Accountability => handling techniques.

Empowerments [of consumers, as individuals]

  • Individual Control => control of collection, control of use.
  • Transparency => of practices [by service providers]
  • Access and Accuracy => right to review & edit [something about proportionality]

Definition of Privacy

The definition is unclear and evolving. It is frequently defined in terms of the harms in curred when it is lost.

Privacy Framework of Via Harms

The Prosser Harms, <quote> page 6.

  1. Intrusion upon seclusion. A person who intentionally intrudes, physically or otherwise (now including electronically), upon the solitude or seclusion of another person or her private affairs or concerns, can be subject to liability for the invasion of her privacy, but only if the intrusion would be highly offensive to a reasonable person.
  2. Public disclosure of private facts. Similarly, a person can be sued for publishing private facts about another person, even if those facts are true. Private facts are those about someone’s personal life that have not previously been made public, that are not of legitimate public concern, and that would be offensive to a reasonable person.
  3. “False light” or publicity. Closely related to defamation, this harm results when false facts are widely published about an individual. In some states, false light includes untrue implications, not just untrue facts as such.
  4. Misappropriation of name or likeness. Individuals have a “right of publicity” to control the use of their name or likeness in commercial settings.

</quote>

Adjacencies

<quote>One perspective informed by new technologies and technology‐mediated communication suggests that privacy is about the “continual management of boundaries between different spheres of action and degrees of disclosure within those spheres,” with privacy and one’s public face being balanced in different ways at different times. See: Leysia Palen, Paul Dourish; Unpacking ‘Privacy’ for a Networked World; In Proceedings of CHI 2003, Association for Computing Machinery, 2003-04-05.</quote>, footnote, page 7.

Adjacency Theory

An oppositional framework wherein harms are “adjacent to” benefits:

  • Invasion of private communications
  • Invasion of privacy ihn a person’s virtual home.
  • Public disclosure of inferred private facts
  • Tracking, stalking and violations of locational privacy.
  • Harm arising from false conclusions about individuals, based on personal profiles from big‐data analytics.
  • Foreclosure of individual autonomy or self‐determination
  • Loss of anonymity and private association.
Mosaic Theory

Oblique referenced via quote from Sotomayor.
<quote>“I would ask whether people reasonably expect that their movements will be recorded and aggregated in a manner that enables the Government to ascertain, more or less at will, their political and religious beliefs, sexual habits, and so on.” United States v. Jones (10‐1259), Sotomayor concurrence.</quote>

Yet, not cited, but related (at least):

Definition of Roles [of data processors]

  • data collectors
  • data analyzers
  • data users

The data generators or producers in this roles framework are substantially only customers or consumers (sic).

Definitions

  • Definition of analysis versus use
    • <quote>Analysis, per se, does not directly touch the individual (it is neither collection nor, without additional action, use) and may have no external visibility.
    • & by contrast, it is the use of a product of analysis, whether in commerce, by government, by the press, or by individuals, that can cause adverse consequences to individuals.</quote>
  • Big Data => definitions
    • [comprises data with] high‐volume, high‐velocity and high‐variety
      information assets that demand cost‐effective, innovative forms of information processing for enhanced insight and decision making,” attributed to Gartner Inc.
    • a term describing the storage and analysis of large and/or complex data sets using a series of techniques including, but not limited to, NoSQL, MapReduce, and machine learning.” attributed to “computer scientists” on arXiv.

Quoted

The strong, direct, unequivocal, un-nuanced, provocative language…

<quote>For a variety of reasons, PCAST judges anonymization, data deletion, and distinguishing data from metadata (defined below) to be in this category. The framework of notice and consent is also becoming unworkable as a useful foundation for policy.</quote>

<quote>Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re‐identify individuals (that is, re‐associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating. </quote>

<quote>Notice and consent is the practice of requiring individuals to give positive consent to the personal data collection practices of each individual app, program, or web service. Only in some fantasy world do users actually read these notices and understand their implications before clicking to indicate their consent. <snip/>The conceptual problem with notice and consent is that it fundamentally places the burden of privacy protection on the individual. Notice and consent creates a non‐level playing field in the implicit privacy negotiation between provider and user. The provider offers a complex, take‐it‐or‐leave‐it set of terms, while the user, in practice, can allocate only a few seconds to evaluating the offer. This is a kind of market failure. </quote>

<quote>Also rapidly changing are the distinctions between government and the private sector as potential threats to individual privacy. Government is not just a “giant corporation.” It has a monopoly in the use of force; it has no direct competitors who seek market advantage over it and may thus motivate it to correct missteps. Governments have checks and balances, which can contribute to self‐imposed limits on what they may do with people’s information. Companies decide how they will use such information in the context of such factors as competitive advantages and risks, government regulation, and perceived threats and consequences of lawsuits. It is thus appropriate that there are different sets of constraints on the public and private sectors. But government has a set of authorities – particularly in the areas of law enforcement and national security – that place it in a uniquely powerful position, and therefore the restraints placed on its collection and use of data deserve special attention. Indeed, the need for such attention is heightened because of the increasingly blurry line between public and private data. While these differences are real, big data is to some extent a leveler of the differences between government and companies. Both governments and companies have potential access to the same sources of data and the same analytic tools. Current rules may allow government to purchase or otherwise obtain data from the private sector that, in some cases, it could not legally collect itself, or to outsource to the private sector analyses it could not itself legally perform. [emphasis here] The possibility of government exercising, without proper safeguards, its own monopoly powers and also having unfettered access to the private information marketplace is unsettling.</quote>

Referenced

Substantially in order of appearance in the footnotes, without repeats.

Via: backfill, backfill


Snide

And yet even with all the letters and professional editing and techwriting staff available to this national- and historical-level enterprise we still see [Footnote 101, page 31]

Qi, H. and A. Gani, “Research on mobile cloud computing: Review, trend and perspectives,” Digital Information and Communication Technology and it’s Applications (DICTAP), 2012 Second International Conference on, 2012.

The correct listing is at Springer

Digital Information and Communication Technology and Its Applications;International Conference, DICTAP 2011, Dijon, France, June 21-23, 2011. Proceedings, Part I, Series: Communications in Computer and Information Science, Vol. 166 Cherifi, Hocine, Zain, Jasni Mohamad, El-Qawasmeh, Eyas (Eds.) 2011, XIV, 806 p.

But:

  • it’s → is a contraction for it is
  • its → is a possessive

Ergo: s/it's/its/g;

Unlikely Outcomes? A Distributed Discussion on The Prospects and Promise of Decentralized Personal Data Architectures | Barocas, Gürses, Narayanan, Toubiana

Solon Barocas, Seda Gürses, Arvind Narayanan, Vincent Toubiana; Unlikely Outcomes? A Distributed Discussion on The Prospects and Promise of Decentralized Personal Data Architectures; Unlike Us Reader; undated? maybe 2012-03; 17 pages.

References

Via backfill