Publications by authors named "Yves-Alexandre de Montjoye"

Information about us, our actions, and our preferences is created at scale through surveys or scientific studies or as a result of our interaction with digital devices such as smartphones and fitness trackers. The ability to safely share and analyze such data is key for scientific and societal progress. Anonymization is considered by scientists and policy-makers as one of the main ways to share data while minimizing privacy risks.

View Article and Find Full Text PDF

Despite machine learning models being widely used today, the relationship between a model and its training dataset is not well understood. We explore correlation inference attacks, whether and when a model leaks information about the correlations between the input variables of its training dataset. We first propose a model-less attack, where an adversary exploits the spherical parameterization of correlation matrices alone to make an informed guess.

View Article and Find Full Text PDF

Despite proportionality being one of the tenets of data protection laws, we currently lack a robust analytical framework to evaluate the reach of modern data collections and the network effects at play. Here, we propose a graph-theoretic model and notions of node- and edge-observability to quantify the reach of networked data collections. We first prove closed-form expressions for our metrics and quantify the impact of the graph's structure on observability.

View Article and Find Full Text PDF

Behavioral data, collected from our daily interactions with technology, have driven scientific advances. Yet, the collection and sharing of this data raise legitimate privacy concerns, as individuals can often be reidentified. Current identification attacks, however, require auxiliary information to roughly match the information available in the dataset, limiting their applicability.

View Article and Find Full Text PDF

Fine-grained records of people's interactions, both offline and online, are collected at large scale. These data contain sensitive information about whom we meet, talk to, and when. We demonstrate here how people's interaction behavior is stable over long periods of time and can be used to identify individuals in anonymous datasets.

View Article and Find Full Text PDF

Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound.

View Article and Find Full Text PDF

While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset.

View Article and Find Full Text PDF

It is often assumed that there is a robust positive symmetrical relationship between happiness and social behavior: Social relationships are viewed as essential to happiness, and happiness is thought to foster social relationships. However, empirical support for this widely held view is surprisingly mixed, and this view does little to clarify which social partner a person will be motivated to interact with when happy. To address these issues, we monitored the happiness and social interactions of more than 30,000 people for a month.

View Article and Find Full Text PDF

The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

View Article and Find Full Text PDF

Poverty is one of the most important determinants of adverse health outcomes globally, a major cause of societal instability and one of the largest causes of lost human potential. Traditional approaches to measuring and targeting poverty rely heavily on census data, which in most low- and middle-income countries (LMICs) are unavailable or out-of-date. Alternate measures are needed to complement and update estimates between censuses.

View Article and Find Full Text PDF

Most theories of motivation have highlighted that human behavior is guided by the hedonic principle, according to which our choices of daily activities aim to minimize negative affect and maximize positive affect. However, it is not clear how to reconcile this idea with the fact that people routinely engage in unpleasant yet necessary activities. To address this issue, we monitored in real time the activities and moods of over 28,000 people across an average of 27 d using a multiplatform smartphone application.

View Article and Find Full Text PDF

Sánchez et al.'s textbook k-anonymization example does not prove, or even suggest, that location and other big-data data sets can be anonymized and of general use. The synthetic data set that they "successfully anonymize" bears no resemblance to modern high-dimensional data sets on which their methods fail.

View Article and Find Full Text PDF

Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact.

View Article and Find Full Text PDF

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential.

View Article and Find Full Text PDF

Complex problem solving in science, engineering, and business has become a highly collaborative endeavor. Teams of scientists or engineers collaborate on projects using their social networks to gather new ideas and feedback. Here we bridge the literature on team performance and information networks by studying teams' problem solving abilities as a function of both their within-team networks and their members' extended networks.

View Article and Find Full Text PDF

Numerous studies have documented the normal age-related decline of neural structure, function, and cognitive performance. Preliminary evidence suggests that meditation may reduce decline in specific cognitive domains and in brain structure. Here we extended this research by investigating the relation between age and fluid intelligence and resting state brain functional network architecture using graph theory, in middle-aged yoga and meditation practitioners, and matched controls.

View Article and Find Full Text PDF

While Bentley et al.'s model is very appealing, in this commentary we argue that researchers interested in big data and collective behavior, including the way humans make decisions, must account for the emotional factor. We investigate how daily choice of activities is influenced by emotions.

View Article and Find Full Text PDF

We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information.

View Article and Find Full Text PDF

Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood in practical contexts. Here, we present a broad characterization of its performance in such situations. First, we revisit and clarify the resolution limit phenomenon for modularity maximization.

View Article and Find Full Text PDF