Publications by authors named "Ben Lehner"

Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins.

View Article and Find Full Text PDF

We present MoCHI, a tool to fit interpretable models using deep mutational scanning data. MoCHI infers free energy changes, as well as interaction terms (energetic couplings) for specified biophysical models, including from multimodal phenotypic data. When a user-specified model is unavailable, global nonlinearities (epistasis) can be estimated from the data.

View Article and Find Full Text PDF

The encoding and evolution of specificity and affinity in protein-protein interactions is poorly understood. Here, we address this question by quantifying how all mutations in one protein, JUN, alter binding to all other members of a protein family, the 54 human basic leucine zipper transcription factors. We fit a global thermodynamic model to the data to reveal that most affinity changing mutations equally affect JUN's affinity to all its interaction partners.

View Article and Find Full Text PDF

There are more ways to synthesize a 100-amino acid (aa) protein (20) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces.

View Article and Find Full Text PDF
Article Synopsis
  • Premature termination codons (PTCs) are responsible for about 10-20% of inherited diseases and play a significant role in the inactivation of tumor suppressor genes in cancer.
  • Researchers aim to counteract PTC effects by promoting translational readthrough, but existing drug therapies face challenges with efficiency across various PTCs.
  • The study quantifies how eight different drugs affect readthrough of approximately 5,800 pathogenic stop codons, leading to predictive models that can help in designing personalized therapies and improving future clinical trials.
View Article and Find Full Text PDF

Amyloid protein aggregates are pathological hallmarks of more than fifty human diseases but how soluble proteins nucleate to form amyloids is poorly understood. Here we use combinatorial mutagenesis, a kinetic selection assay, and machine learning to massively perturb the energetics of the nucleation reaction of amyloid beta (Aβ42), the protein that aggregates in Alzheimer's disease. In total, we quantify the nucleation rates of >140,000 variants of Aβ42.

View Article and Find Full Text PDF

Protein aggregation is a pathological hallmark of more than fifty human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the amyloid nucleation of >100,000 protein sequences.

View Article and Find Full Text PDF

Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings.

View Article and Find Full Text PDF

Thousands of human proteins function by binding short linear motifs embedded in intrinsically disordered regions. How affinity and specificity are encoded in these binding domains and the motifs themselves is not well understood. The evolvability of binding specificity - how rapidly and extensively it can change upon mutation - is also largely unexplored, as is the contribution of 'fuzzy' dynamic residues to affinity and specificity in protein-protein interactions.

View Article and Find Full Text PDF

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them.

View Article and Find Full Text PDF

Thousands of proteins have been validated genetically as therapeutic targets for human diseases. However, very few have been successfully targeted, and many are considered 'undruggable'. This is particularly true for proteins that function via protein-protein interactions-direct inhibition of binding interfaces is difficult and requires the identification of allosteric sites.

View Article and Find Full Text PDF

An important challenge in genetics, evolution and biotechnology is to understand and predict how mutations combine to alter phenotypes, including molecular activities, fitness and disease. In diploids, mutations in a gene can combine on the same chromosome or on different chromosomes as a "heteroallelic combination". However, a direct comparison of the extent, sign, and stability of the genetic interactions between variants within and between alleles is lacking.

View Article and Find Full Text PDF

Multiplexed assays of variant effects (MAVEs) have made possible the functional assessment of all possible mutations to genes and regulatory sequences. A core pillar of the approach is generation of variant libraries, but current methods are either difficult to scale or not uniform enough to enable MAVEs at the scale of gene families or beyond. We present an improved method called Scalable and Uniform Nicking (SUNi) mutagenesis that combines massive scalability with high uniformity to enable cost-effective MAVEs of gene families and eventually genomes.

View Article and Find Full Text PDF

Multiplexed assays of variant effects (MAVEs) guide clinical variant interpretation and reveal disease mechanisms. To date, MAVEs have focussed on a single mutation type-amino acid (AA) substitutions-despite the diversity of coding variants that cause disease. Here we use Deep Indel Mutagenesis (DIM) to generate a comprehensive atlas of diverse variant effects for a disease protein, the amyloid beta (Aβ) peptide that aggregates in Alzheimer's disease (AD) and is mutated in familial AD (fAD).

View Article and Find Full Text PDF

Somatic mutations are an inevitable component of ageing and the most important cause of cancer. The rates and types of somatic mutation vary across individuals, but relatively few inherited influences on mutation processes are known. We perform a gene-based rare variant association study with diverse mutational processes, using human cancer genomes from over 11,000 individuals of European ancestry.

View Article and Find Full Text PDF

Allosteric communication between distant sites in proteins is central to biological regulation but still poorly characterized, limiting understanding, engineering and drug development. An important reason for this is the lack of methods to comprehensively quantify allostery in diverse proteins. Here we address this shortcoming and present a method that uses deep mutational scanning to globally map allostery.

View Article and Find Full Text PDF

The classic two-hit model posits that both alleles of a tumor suppressor gene (TSG) must be inactivated to cause cancer. In contrast, for some oncogenes and haploinsufficient TSGs, a single genetic alteration can suffice to increase tumor fitness. Here, by quantifying the interactions between mutations and copy number alterations (CNAs) across 10,000 tumors, we show that many cancer genes actually switch between acting as one-hit or two-hit drivers.

View Article and Find Full Text PDF

An old and controversial question in biology is whether information perceived by the nervous system of an animal can "cross the Weismann barrier" to alter the phenotypes and fitness of their progeny. Here, we show that such intergenerational transmission of sensory information occurs in the model organism, C. elegans, with a major effect on fitness.

View Article and Find Full Text PDF

Plaques of the amyloid beta (Aß) peptide are a pathological hallmark of Alzheimer's disease (AD), the most common form of dementia. Mutations in Aß also cause familial forms of AD (fAD). Here, we use deep mutational scanning to quantify the effects of >14,000 mutations on the aggregation of Aß.

View Article and Find Full Text PDF

The nonsense-mediated mRNA decay (NMD) pathway degrades some but not all mRNAs bearing premature termination codons (PTCs). Decades of work have elucidated the molecular mechanisms of NMD. More recently, statistical analyses of large genomic datasets have allowed the importance of known and novel 'rules of NMD' to be tested and combined into methods that accurately predict whether PTC-containing mRNAs are degraded or not.

View Article and Find Full Text PDF

Genetic analyses and systematic mutagenesis have revealed that synonymous, non-synonymous and intronic mutations frequently alter the inclusion levels of alternatively spliced exons, consistent with the concept that altered splicing might be a common mechanism by which mutations cause disease. However, most exons expressed in any cell are highly-included in mature mRNAs. Here, by performing deep mutagenesis of highly-included exons and by analysing the association between genome sequence variation and exon inclusion across the transcriptome, we report that mutations only very rarely alter the inclusion of highly-included exons.

View Article and Find Full Text PDF

A goal of biology is to predict how mutations combine to alter phenotypes, fitness and disease. It is often assumed that mutations combine additively or with interactions that can be predicted. Here, we show using simulations that, even for the simple example of the lambda phage transcription factor CI repressing a gene, this assumption is incorrect and that perfect measurements of the effects of mutations on a trait and mechanistic understanding can be insufficient to predict what happens when two mutations are combined.

View Article and Find Full Text PDF

Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods.

View Article and Find Full Text PDF

Premature termination codons (PTCs) can result in the production of truncated proteins or the degradation of messenger RNAs by nonsense-mediated mRNA decay (NMD). Which of these outcomes occurs can alter the effect of a mutation, with the engagement of NMD being dependent on a series of rules. Here, by applying these rules genome-wide to obtain a resource called NMDetective, we explore the impact of NMD on genetic disease and approaches to therapy.

View Article and Find Full Text PDF