Publications by authors named "Jon Kleinberg"

The Friendship Paradox is a simple and powerful statement about node degrees in a graph. However, it only applies to undirected graphs with no edge weights, and the only node characteristic it concerns is degree. Since many social networks are more complex than that, it is useful to generalize this phenomenon, if possible, and a number of papers have proposed different generalizations.

View Article and Find Full Text PDF

More and more machine learning is applied to human behavior. Increasingly these algorithms suffer from a hidden-but serious-problem. It arises because they often predict one thing while hoping for another.

View Article and Find Full Text PDF

Contact tracing is a key tool for managing epidemic diseases like HIV, tuberculosis, COVID-19, and monkeypox. Manual investigations by human-contact tracers remain a dominant way in which this is carried out. This process is limited by the number of contact tracers available, who are often overburdened during an outbreak or epidemic.

View Article and Find Full Text PDF

The Friendship Paradox-the principle that "your friends have more friends than you do"-is a combinatorial fact about degrees in a graph; but given that many web-based social activities are correlated with a user's degree, this fact has been taken more broadly to suggest the empirical principle that "your friends are also more active than you are." This Generalized Friendship Paradox, the notion that any attribute positively correlated with degree obeys the Friendship Paradox, has been established mathematically in a network-level version that essentially aggregates uniformly over all the edges of a network. Here we show, however, that the natural node-based version of the Generalized Friendship Paradox-which aggregates over nodes, not edges-may fail, even for degree-attribute correlations approaching 1.

View Article and Find Full Text PDF

Homophily is the seemingly ubiquitous tendency for people to connect and interact with other individuals who are similar to them. This is a well-documented principle and is fundamental for how society organizes. Although many social interactions occur in groups, homophily has traditionally been measured using a graph model, which only accounts for pairwise interactions involving two individuals.

View Article and Find Full Text PDF

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated.

View Article and Find Full Text PDF

Homophily-the tendency of nodes to connect to others of the same type-is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neighbors). Through this view, we find a surprising result for homophily values that applies with only minimal assumptions on the graph topology.

View Article and Find Full Text PDF

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here, we show that the dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents.

View Article and Find Full Text PDF

Preventing discrimination requires that we have means of detecting it, and this can be enormously difficult when human beings are making the underlying decisions. As applied today, algorithms can increase the risk of discrimination. But as we argue here, algorithms by their nature require a far greater level of specificity than is usually possible with human decision making, and this specificity makes it possible to probe aspects of the decision in additional ways.

View Article and Find Full Text PDF

Networks provide a powerful formalism for modeling complex systems by using a model of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once-for example, communication within a group rather than person to person, collaboration among a team rather than a pair of coauthors, or biological interaction between a set of molecules rather than just two. Such higher-order interactions are ubiquitous, but their empirical study has received limited attention, and little is known about possible organizational principles of such structures.

View Article and Find Full Text PDF

Evaluating whether machines improve on human performance is one of the central questions of machine learning. However, there are many domains where the data is in the sense that the observed outcomes are themselves a consequence of the existing choices of the human decision-makers. For instance, in the context of judicial bail decisions, we observe the outcome of whether a defendant fails to return for their court appearance only if the human judge decides to release the defendant on bail.

View Article and Find Full Text PDF

Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application.

View Article and Find Full Text PDF

Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the "seed set expansion problem": given a subset [Formula: see text] of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of "landing probabilities" of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective.

View Article and Find Full Text PDF

The growth of the Web has required us to think about the design of information systems in which large-scale computational and social feedback effects are simultaneously at work. At the same time, the data generated by Web-scale systems--recording the ways in which millions of participants create content, link information, form groups and communicate with one another--have made it possible to evaluate long-standing theories of social interaction, and to formulate new theories based on what we observe. These developments have created a new level of interaction between computing and the social sciences, enriching the perspectives of both of these disciplines.

View Article and Find Full Text PDF

The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her "contact neighborhood"--the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process.

View Article and Find Full Text PDF

A dilemma faced by teachers, and increasingly by designers of educational software, is the trade-off between teaching new material and reviewing what has already been taught. Complicating matters, review is useful only if it is neither too soon nor too late. Moreover, different students need to review at different rates.

View Article and Find Full Text PDF

It is not uncommon for certain social networks to divide into two opposing camps in response to stress. This happens, for example, in networks of political parties during winner-takes-all elections, in networks of companies competing to establish technical standards, and in networks of nations faced with mounting threats of war. A simple model for these two-sided separations is the dynamical system dX/dt = X(2), where X is a matrix of the friendliness or unfriendliness between pairs of nodes in the network.

View Article and Find Full Text PDF

We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties.

View Article and Find Full Text PDF

We model a close-knit community of friends and enemies as a fully connected network with positive and negative signs on its edges. Theories from social psychology suggest that certain sign patterns are more stable than others. This notion of social "balance" allows us to define an energy landscape for such networks.

View Article and Find Full Text PDF