We explore the potential of crowd-sourced information on human mobility and activities in an urban population drawn from a significant fraction of smartphones in the Los Angeles basin during February-May 2015. The raw dataset was collected by WeFi, a smartphone app provider. The dataset is noisy, irregular, and lean; however, it is large scale (over a billion events), cheap to collect, and arguably unbiased.
View Article and Find Full Text PDFThe evolutionary theory of language predicts that a language will tend towards fewer synonyms for a given object. We subject this and related predictions to empirical tests, using data from the eBay Big Data Lab which let us access all records of the words used by eBay vendors in their item titles, and by consumers in their searches. We find support for the predictions of the evolutionary theory of language.
View Article and Find Full Text PDFContrary to the assumption that web browsers are designed to support the user, an examination of a 900,000 distinct PCs shows that web browsers comprise a complex ecosystem with millions of addons collaborating and competing with each other. It is possible for addons to "sneak in" through third party installations or to get "kicked out" by their competitors without user involvement. This study examines that ecosystem quantitatively by constructing a large-scale graph with nodes corresponding to users, addons, and words (terms) that describe addon functionality.
View Article and Find Full Text PDFIn August 2013, we held a panel discussion at the KDD 2013 conference in Chicago on the subject of data science, data scientists, and start-ups. KDD is the premier conference on data science research and practice. The panel discussed the pros and cons for top-notch data scientists of the hot data science start-up scene.
View Article and Find Full Text PDF