Publications by authors named "David M McCandlish"

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms.

View Article and Find Full Text PDF

The effect of replacing the amino acid at a given site in a protein is difficult to predict. Yet, evolutionary comparisons have revealed highly regular patterns of interchangeability between pairs of amino acids, and such patterns have proved enormously useful in a range of applications in bioinformatics, evolutionary inference, and protein design. Here we reconcile these apparently contradictory observations using fitness data from over 350,000 experimental amino acid replacements.

View Article and Find Full Text PDF

Cells possess multiple mitochondrial DNA (mtDNA) copies, which undergo semi-autonomous replication and stochastic inheritance. This enables mutant mtDNA variants to arise and selfishly compete with cooperative (wildtype) mtDNA. Selfish mitochondrial genomes are subject to selection at different levels: they compete against wildtype mtDNA directly within hosts and indirectly through organism-level selection.

View Article and Find Full Text PDF

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins.

View Article and Find Full Text PDF

Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e.

View Article and Find Full Text PDF

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability.

View Article and Find Full Text PDF

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms.

View Article and Find Full Text PDF

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them.

View Article and Find Full Text PDF

Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy.

View Article and Find Full Text PDF

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling.

View Article and Find Full Text PDF

Epistasis between genes is traditionally studied with mutations that eliminate protein activity, but most natural genetic variation is in cis-regulatory DNA and influences gene expression and function quantitatively. In this study, we used natural and engineered cis-regulatory alleles in a plant stem-cell circuit to systematically evaluate epistatic relationships controlling tomato fruit size. Combining a promoter allelic series with two other loci, we collected over 30,000 phenotypic data points from 46 genotypes to quantify how allele strength transforms epistasis.

View Article and Find Full Text PDF

AbstractThe joint distribution of selection coefficients and mutation rates is a key determinant of the genetic architecture of molecular adaptation. Three different distributions are of immediate interest: (1) the "nominal" distribution of possible changes, prior to mutation or selection; (2) the "de novo" distribution of realized mutations; and (3) the "fixed" distribution of selectively established mutations. Here, we formally characterize the relationships between these joint distributions under the strong-selection/weak-mutation (SSWM) regime.

View Article and Find Full Text PDF

Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions.

View Article and Find Full Text PDF

Some protein binding pairs exhibit extreme specificities that functionally insulate them from homologs. Such pairs evolve mostly by accumulating single-point mutations, and mutants are selected if their affinity exceeds the threshold required for function. Thus, homologous and high-specificity binding pairs bring to light an evolutionary conundrum: how does a new specificity evolve while maintaining the required affinity in each intermediate? Until now, a fully functional single-mutation path that connects two orthogonal pairs has only been described where the pairs were mutationally close thus enabling experimental enumeration of all intermediates.

View Article and Find Full Text PDF

Predicting evolutionary outcomes is an important research goal in a diversity of contexts. The focus of evolutionary forecasting is usually on adaptive processes, and efforts to improve prediction typically focus on selection. However, adaptive processes often rely on new mutations, which can be strongly influenced by predictable biases in mutation.

View Article and Find Full Text PDF

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging.

View Article and Find Full Text PDF

Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets.

View Article and Find Full Text PDF

Evolutionary adaptation often occurs by the fixation of beneficial mutations. This mode of adaptation can be characterized quantitatively by a spectrum of adaptive substitutions, i.e.

View Article and Find Full Text PDF
Article Synopsis
  • Genomic data can help tailor treatments to individual patients, particularly through identifying mutations that respond to specific therapies.
  • A new resampling method was developed for comparing gene pair selections, which was tested on the ALK variant in melanoma, initially believed to predict sensitivity to ALK inhibitors.
  • The findings show that ALK isn't mutually exclusive with critical melanoma oncogenes and doesn't indicate sufficient cancer growth or responsiveness to treatment, challenging its role as a target for therapy.
View Article and Find Full Text PDF

Density estimation in sequence space is a fundamental problem in machine learning that is also of great importance in computational biology. Due to the discrete nature and large dimensionality of sequence space, how best to estimate such probability distributions from a sample of observed sequences remains unclear. One common strategy for addressing this problem is to estimate the probability distribution using maximum entropy (i.

View Article and Find Full Text PDF

Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data.

View Article and Find Full Text PDF

An underexplored question in evolutionary genetics concerns the extent to which mutational bias in the production of genetic variation influences outcomes and pathways of adaptive molecular evolution. In the genomes of at least some vertebrate taxa, an important form of mutation bias involves changes at CpG dinucleotides: if the DNA nucleotide cytosine (C) is immediately 5' to guanine (G) on the same coding strand, then-depending on methylation status-point mutations at both sites occur at an elevated rate relative to mutations at non-CpG sites. Here, we examine experimental data from case studies in which it has been possible to identify the causative substitutions that are responsible for adaptive changes in the functional properties of vertebrate haemoglobin (Hb).

View Article and Find Full Text PDF

Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships.

View Article and Find Full Text PDF