Recent literature advances Wi-Fi signals to “see” people’s motions and locations. This paper asks the following question: Can Wi-Fi “hear” our talks? We present WiHear, which enables Wi-Fi signals to “hear” our talks without deploying any devices. To achieve this, WiHear needs to detect and analyze fine-grained radio reflections from mouth movements. WiHear solves this micro-movement detection problem by introducing Mouth Motion Profile that leverages partial multipath effects and wavelet packet transformation. Since Wi-Fi signals do not require line-of-sight, WiHear can “hear” people talks within the radio range. Further, WiHear can simultaneously “hear” multiple people’s talks leveraging MIMO technology. We implement WiHear on both USRP N210 platform and commercial Wi-Fi infrastructure. Results show that within our pre-defined vocabulary, WiHear can achieve detection accuracy of 91% on average for single individual speaking no more than 6 words and up to 74% for no more than 3 people talking simultaneously. Moreover, the detection accuracy can be further improved by deploying multiple receivers from different angle.
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that
DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that
DBSCAN outperforms CLARANS by factor of more than 100 in terms of efficiency.
Pietro Della Briotta Parolo, Raj Kumar Pan, Rumi Ghosh, Bernardo A. Huberman, Kimmo Kaski, Santo Fortunato; Attention Decay in Science; preprint; Elsevier (submitted to some journal of theirs); submitted: 2015-03-09; 12 pages; arXiv:1503.01881.
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time.
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or ’boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and person
We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.
Gosling SD, Ko SJ, Mannarelli T, Morris ME (2002) A room with a cue: Personality
judgments based on offices and bedrooms. In Journal of Personal Social Psychology 82(3):379–398.
Hu J, Zeng H-J, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In Proceedings of the International World Wide Web Conference (WWW), pp 151–160.
Murray D, Durrell K (1999) Inferring demographic attributes of anonymous Internet
users. In Revised Papers from the International Workshop on Web Usage Analysis and User Profiling, eds Masand BM, Spiliopoulou M (Springer, London), pp 7–20.
De Bock K, Van Den Poel D (2010) Predicting website audience demographics for Web advertising targeting using multi-website clickstream data. In Fundamenta Informaticae 98(1):49–70.
Goel S, Hofman JM, Sirer MI (2012) Who does what on the Web: Studying Web
browsing behavior at scale. In International Conference on Weblogs and Social Media, pp 130–137.
Kosinski M, Kohli P, Stillwell DJ, Bachrach Y, Graepel T (2012) Personality and website choice. In Proceedings of the ACM Web Science Conference, pp 251–254.
Marcus B, Machilek F, Schütz A (2006) Personality in cyberspace: Personal Web sites as media for personality expressions and impressions. In Journal of Personal Social Psychology 90(6):1014–1031.
Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: The structure and
personality correlates of music preferences. In Journal Personal Social Psychology 84(6):1236–1256.
Quercia D, Lambiotte R, Kosinski M, Stillwell D, Crowcroft J (2012) The Personality of popular Facebook users. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW), 2012, pp 955–964.
Bachrach Y, Kohli P, Graepel T, Stillwell DJ, Kosinski M (2012) Personality and patterns of Facebook usage. In Proceedings of the ACM Web Science Conference, pp 36–44.
Quercia D, Kosinski M, Stillwell DJ, Crowcroft J (2011) Our Twitter profiles, our selves: Predicting personality with Twitter. In Proceedings of the 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, or maybe in Proceedings of the IEEE International Conference on Social Computing, pp 180–185.
Golbeck J, Robles C, Edmondson M, Turner K (2011) Predicting personality from
Twitter. Proceedings of the IEEE International Conference on Social Computing, pp 149–156.
Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pp 253–262.
Jernigan C, Mistree BF (2009) Gaydar: Facebook friendships expose sexual orientation. First Monday 14(10).
Goldberg LR, et al. (2006) The international personality item pool and the future of
public-domain personality measures. In Journal Research in Personality 40(1):84–96.
Raven JC (2000) The Raven’s progressive matrices: Change and stability over culture and time. In Cognitive Psychology 41(1):1–48.
Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. In Journal Personal Assessment 49(1):71–75.
Musick K, Meier A (2010) Are both parents always better than one? Parental conflict
and young adult well-being. In Social Science Research 39(5):814–830.
Schimmack U, Diener E, Oishi S (2002) Life-satisfaction is a momentary judgment and a stable personality characteristic: The use of chronically accessible and stable sources. In Journal of Personality 70(3):345–384.
Nass C, Lee KM (2000) Does computer-generated speech manifest personality? An experimental test of similarity-attraction. In Journal of Experimental Psychology 7(3):171–181.
Goldberg LR, et al. (2006) The international personality item pool and the future of public-domain personality measures. In Journal of Research on Personality 40(1):84–96.
Raven JC (2000) The Raven’s progressive matrices: change and stability over culture and time. In Cognitive Psychology 41(1):1–48.
Lubinski D (2004) Introduction to the special section on cognitive abilities: 100 years after Spearman’s (1904) “’General intelligence,’ objectively determined and measured”. In Journal of Personal Social Psychology 86(1):96–111.
Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. In Journal of Personal Assessment 49(1):71–75.
Golub GH, Kahan W (1965) Calculating the singular values and pseudo-inverse of a matrix. In Journal Society for Industrial & Applied Math (SIAM) 2(2):205–224.
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks.
The collection of driving habits data is gaining momentum as vehicle telematics based solutions become popular in consumer markets such as auto-insurance and driver assistance services. These solutions rely on driving features such as time of travel, speed, and braking to assess accident risk and driver safety. Given the privacy issues surrounding the geographic tracking of individuals, many solutions explicitly claim that the customer’s GPS coordinates are not recorded. Although revealing driving habits can give us access to a number of innovative products, we believe that the disclosure of this data only offers a false sense of privacy. Using speed and time data from real world driving trips, we show that the destinations of trips may also be determined without having to record GPS coordinates. Based on this, we argue that customer privacy expectations in non-tracking telematics applications need to be reset, and new policies need to be implemented to inform customers of possible risks.
State Farm’s In-Drive,
National General Insurance’s Low-Mileage Discount,
With the widespread use of email, we now have access to unprecedented amounts of text that we ourselves have written. In this paper, we show how sentiment analysis can be used in tandem with effective visualizations to quantify and track emotions in many types of mail. We create a large word–emotion association lexicon by crowdsourcing, and use it to compare emotions in love letters, hate mail, and suicide notes. We show that there are marked differences across genders in how they use emotion words in work-place email. For example, women use many words from the joy–sadness axis, whereas men prefer terms from the fear–trust axis. Finally, we show visualizations that can help people track emotions in their emails.
Today we have access to unprecedented amounts of literary texts. However, search still relies heavily on key words. In this paper, we show how sentiment analysis can be used in tandem with effective visualizations to quantify and track emotions in both individual books and across very large collections. We introduce the concept of emotion word density, and using the Brothers Grimm fairy tales as example, we show how collections of text can be organized for better search. Using the Google Books Corpus we show how to determine an entity’s emotion associations from cooccurring words. Finally, we compare emotion words in fairy tales and novels, to show that fairy tales have a much wider range of emotion word densities than novels.
The INFORMS Code of Ethics for Certified Analytics Professionals includes six sections under the responsibilities for an analyst in the field. Excerpts are below:
All professionals have societal obligations to perform their work in a professional, competent and ethical manner.
Professionals should adhere to all applicable laws, regulations and international covenants.
Employers and Clients
It is the practitioner’s responsibility to assure employers and clients that an analytical approach is suitable to their needs and resources, and include presenting the capabilities and limitations of analytical methods in addressing their problem.
Analytics professionals should clearly state their qualifications and relevant experience.
It is imperative to fulfill all commitments to employers and clients, guard any privileged information they provide unless required to disclose, and accept full responsibility for your performance.
Where appropriate, present a client or employer with choices among valid alternative approaches that may vary in scope, cost, or precision.
Apply analytical methods and procedures scientifically, without predetermining the outcome.
Resist any pressure from employers and clients to produce a particular “result,” regardless of its validity.
Analytics professionals have a responsibility to promote the effective and efficient use of analytical methods by all members of research teams and to respect the ethical obligations of members of other disciplines.
When possible, professionals share nonproprietary data and methods with others; participate in peer review, focusing on the assessment of methods not individuals.
Respect differing professional opinions while acknowledging the contributions and intellectual property of others.
Those professionals involved in teaching or training students or junior analysts have a responsibility to instill in them an appreciation for the practical value of the concept and methods they are learning.
Those in leadership and decision-making roles should use professional qualifications with regard to analytic professionals’ hiring, firing, promotion, work assignments, and other professional matters.
Avoid harassment of or discrimination based on professionally irrelevant bases such as race, color, ethnicity, gender, sexual orientation, national origin, age, religion, nationality, or disability.
If a project involves research subjects, including census or survey respondents, an analytics professional will know and adhere to the appropriate rules for the protection of those human subjects.
Be particularly aware of situations involving vulnerable populations that may be subject to special risks and may not be able to protect their own interests.
This responsibility includes protecting the privacy and confidentiality of research subjects and data concerning them.
INFORMS and the Profession
Analytics professionals will strive for relevance in all analyses.
Each study or project should be based on a competent understanding of the subject-matter issues, appropriate analytical methods, and technical criteria to justify both the practical relevance of the study and the data to be used.
Guard against the possibility that a predisposition by investigators or data providers might predetermine the analytical result.
Remain current in constantly changing analytical methodology, as preferred methods from yesterday may be barely acceptable today and totally obsolete tomorrow.
Disclose conflicts of interest, financial and otherwise, and resolve them.
Provide only such expert testimony as you would be willing to have peer reviewed.
Maintain personal responsibility for all work bearing your name; avoid undertaking work or coauthoring publications for which you would not want to acknowledge responsibility.
Certified Analytics Professionals will strive to avoid condoning or appearing to condone careless, incompetent, or unethical practices. Misconduct broadly includes all professional dishonesty, by commission or omission, and, within the realm of professional activities and expression, all harmful disrespect for people, unauthorized or illegal use of their intellectual and physical property, and unjustified detraction from the reputation of others.
Recognize that differences of opinion and honest error do not constitute misconduct; they warrant discussion, but not accusation.
Questionable scientific practices may or may not constitute misconduct, depending on their nature and the definition of misconduct used.
Do not condone retaliation against or damage to the employability or those who responsibly call attention to possible scientific error or misconduct.
Fail to use scientific methods in performing data science.
Fail to rank the quality of evidence in a reasonable and understandable manner for the client.
Claim weak or uncertain evidence is strong evidence.
Misuse weak or uncertain evidence to communicate a false reality or promote an illusion of understanding.
Fail to rank the quality of data in a reasonable and understandable manner for the client.
Claim bad or uncertain data quality is good data quality.
Misuse bad or uncertain data quality to communicate a false reality or promote an illusion of understanding.
Fail to disclose any and all data science results or engage in cherry-picking.
Fail to attempt to replicate data science results.
Fail to disclose that data science results could not be replicated.
Misuse data science results to communicate a false reality or promote an illusion of understanding.
Fail to disclose failed experiments or disconfirming evidence known to the data scientist to be directly adverse to the position of the client.
Offer evidence that the data scientist knows to be false. If a data scientist questions the quality of data or evidence the data scientist must disclose this to the client. If a data scientist has offered material evidence and the data scientist comes to know of its falsity, the data scientist shall take reasonable remedial measures, including disclosure to the client. A data scientist may disclose and label evidence the data scientist reasonably believes is false.
Bonnie H. Erickson (Toronto); Secret Societies and Social Structure; In Social Forces; Vol. 60, No. 1; pages 188-210; paywalled: JSTOR.
Bonnie H. Erickson (Toronto); The Structure of Ignorance; In CONNECTIONS; Vol. 19, No. 1; 1996; pages 28-38, 11 pages; the keynote address at the Sunbelt International Social Network Conference, Charleston, SC, 1996-02.