Private traits and attributes are predictable from digital records of human behavior | Kosinski, Stillwell, Graepel

Michal Kosinski, David Stillwell, Thore Graepel; Private traits and attributes are predictable from digital records of human behavior; In Proceedings of the National Academy of Sciences of the United States of America (PNAS); 2013-02-12; 4 pages; landing.

Abstract

We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

Mentions

  • You Are What You Like, promotional site.
  • Singular Value Decomposition (SVD)
  • Pseudo-Inverse of a Matrix
  • Five Factor Model (FFM)
    • Dimensions
      1. Openness to Experience
      2. Conscientiousness
      3. Extraversion
      4. Agreeableness
      5. Emotional Stability
    • Instruments
      • NEO Personality Inventory (NEO-PI-R)
      • NEO Five-Factor Inventory (NEO-FFI)
  • Intelligence
    • Raven’s Standard Progressive Matrices (SPM)
    • Spearman’s Theory of General Ability
  • International Personality Item Pool (IPIP)
  • Satisfaction With Life (SWL)
  • myPersonality Project
  • Receiver-Operating Characteristic (ROC)
  • Area Under [the] Curve (AUC)

References

  • Lazer D, et al. (2009) Computational social science. In Science 323(5915):721–723.
  • Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender
    systems. In Computer 42(8):30–37.
  • Chen Y, Pavlov D, Canny JF (2009) Large-scale behavioral targeting. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp 209–218.
  • Butler D (2007) Data sharing threatens privacy. In Nature 449(7163):644–645.
  • Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy, pp 111–125.
  • Duhigg C (2012) The Power of Habit: Why We Do What We Do in Life and Business
    (Random House, New York).
  • Ince HO, Yarali A, Özsel D (2009) Customary killings in Turkey and Turkish modernization. In Middle East Studies 45(4):537–551.
  • 8. Fast LA, Funder DC (2008) Personality as manifest in word use: Correlations with selfreport, acquaintance report, and behavior. In Journal of Personal Social Psychology 94(2):334–346.
  • Costa PT, McCrae RR (1992) Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Manual (Psychological Assessment Resources, Odessa, FL).
  • Gosling SD, Ko SJ, Mannarelli T, Morris ME (2002) A room with a cue: Personality
    judgments based on offices and bedrooms. In Journal of Personal Social Psychology 82(3):379–398.
  • Hu J, Zeng H-J, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In Proceedings of the International World Wide Web Conference (WWW), pp 151–160.
  • Murray D, Durrell K (1999) Inferring demographic attributes of anonymous Internet
    users. In Revised Papers from the International Workshop on Web Usage Analysis and User Profiling, eds Masand BM, Spiliopoulou M (Springer, London), pp 7–20.
  • De Bock K, Van Den Poel D (2010) Predicting website audience demographics for Web advertising targeting using multi-website clickstream data. In Fundamenta Informaticae 98(1):49–70.
  • Goel S, Hofman JM, Sirer MI (2012) Who does what on the Web: Studying Web
    browsing behavior at scale. In International Conference on Weblogs and Social Media, pp 130–137.
  • Kosinski M, Kohli P, Stillwell DJ, Bachrach Y, Graepel T (2012) Personality and website choice. In Proceedings of the ACM Web Science Conference, pp 251–254.
  • Marcus B, Machilek F, Schütz A (2006) Personality in cyberspace: Personal Web sites as media for personality expressions and impressions. In Journal of Personal Social Psychology 90(6):1014–1031.
  • Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: The structure and
    personality correlates of music preferences. In Journal Personal Social Psychology 84(6):1236–1256.
  • Quercia D, Lambiotte R, Kosinski M, Stillwell D, Crowcroft J (2012) The Personality of popular Facebook users. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW), 2012, pp 955–964.
  • Bachrach Y, Kohli P, Graepel T, Stillwell DJ, Kosinski M (2012) Personality and patterns of Facebook usage. In Proceedings of the ACM Web Science Conference, pp 36–44.
  • Quercia D, Kosinski M, Stillwell DJ, Crowcroft J (2011) Our Twitter profiles, our selves: Predicting personality with Twitter. In Proceedings of the 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, or maybe in Proceedings of the IEEE International Conference on Social Computing, pp 180–185.
  • Golbeck J, Robles C, Edmondson M, Turner K (2011) Predicting personality from
    Twitter. Proceedings of the IEEE International Conference on Social Computing, pp 149–156.
  • Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pp 253–262.
  • Jernigan C, Mistree BF (2009) Gaydar: Facebook friendships expose sexual orientation. First Monday 14(10).
  • Golub GH, Kahan W (1965) Calculating the singular values and pseudo-inverse of a matrix. In Journal Society for Industrial & Applied Math (SIAM) 2(2):205–224; also as Journal of SIAM Numerical Analysis, B 2(2).
  • Goldberg LR, et al. (2006) The international personality item pool and the future of
    public-domain personality measures. In Journal Research in Personality 40(1):84–96.
  • Raven JC (2000) The Raven’s progressive matrices: Change and stability over culture and time. In Cognitive Psychology 41(1):1–48.
  • Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. In Journal Personal Assessment 49(1):71–75.
  • Musick K, Meier A (2010) Are both parents always better than one? Parental conflict
    and young adult well-being. In Social Science Research 39(5):814–830.
  • Schimmack U, Diener E, Oishi S (2002) Life-satisfaction is a momentary judgment and a stable personality characteristic: The use of chronically accessible and stable sources. In Journal of Personality 70(3):345–384.
  • Nass C, Lee KM (2000) Does computer-generated speech manifest personality? An experimental test of similarity-attraction. In Journal of Experimental Psychology 7(3):171–181.

Methodology

  • Costa PT, McCrae RR (1992) Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Manual (Psychological Assessment Resources, Odessa, FL).
  • Goldberg LR, et al. (2006) The international personality item pool and the future of public-domain personality measures. In Journal of Research on Personality 40(1):84–96.
  • Raven JC (2000) The Raven’s progressive matrices: change and stability over culture and time. In Cognitive Psychology 41(1):1–48.
  • Lubinski D (2004) Introduction to the special section on cognitive abilities: 100 years after Spearman’s (1904) “’General intelligence,’ objectively determined and measured”. In Journal of Personal Social Psychology 86(1):96–111.
  • Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. In Journal of Personal Assessment 49(1):71–75.
  • Golub GH, Kahan W (1965) Calculating the singular values and pseudo-inverse of a matrix. In Journal Society for Industrial & Applied Math (SIAM) 2(2):205–224.

Actualities

Private traits and attributes are predictable from digital records of human behavior | You Are What You Like, My Personality @ Facebook

Michal Kosinski, David Stillwell, Thore Graepel; Private traits and attributes are predictable from digital records of human behavior; In Proceedings of the National Academy of Sciences of the United States of America; Vol. 1210, No. 10; 2013-03-05; supplemental info.

Abstract

We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

Promotions