Amyotrophic lateral sclerosis (ALS) is a progressive motor neuron disease for which important subtypes are caused by variation in the Superoxide Dismutase 1 gene . Diagnosis based on sequencing can not only be definitive but also indicate specific therapies available for -associated ALS (SOD1-ALS). Unfortunately, SOD1-ALS diagnosis is limited by the fact that a substantial fraction (currently 26%) of ClinVar SOD1 missense variants are classified as "variants of uncertain significance" (VUS).
View Article and Find Full Text PDFUnlabelled: uses over 300 translocated effector proteins to rewire host cells during infection and create a replicative niche for intracellular growth. To date, several studies have identified effectors that indirectly and directly regulate the activity of other effectors, providing an additional layer of regulatory complexity. Among these are "metaeffectors," a special class of effectors that regulate the activity of other effectors once inside the host.
View Article and Find Full Text PDFBackground: Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.
View Article and Find Full Text PDFMotivation: Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g.
View Article and Find Full Text PDFTo maintain genome integrity, cells must accurately duplicate their genome and repair DNA lesions when they occur. To uncover genes that suppress DNA damage in human cells, we undertook flow-cytometry-based CRISPR-Cas9 screens that monitored DNA damage. We identified 160 genes whose mutation caused spontaneous DNA damage, a list enriched in essential genes, highlighting the importance of genomic integrity for cellular fitness.
View Article and Find Full Text PDFBackground: Glucokinase (GCK) regulates insulin secretion to maintain appropriate blood glucose levels. Sequence variants can alter GCK activity to cause hyperinsulinemic hypoglycemia or hyperglycemia associated with GCK-maturity-onset diabetes of the young (GCK-MODY), collectively affecting up to 10 million people worldwide. Patients with GCK-MODY are frequently misdiagnosed and treated unnecessarily.
View Article and Find Full Text PDFThe impact of millions of individual genetic variants on molecular phenotypes in coding sequences remains unknown. Multiplexed assays of variant effect (MAVEs) are scalable methods to annotate relevant variants, but existing software lacks standardization, requires cumbersome configuration, and does not scale to large targets. We present satmut_utils as a flexible solution for simulation and variant quantification.
View Article and Find Full Text PDFGenerating reference maps of interactome networks illuminates genetic studies by providing a protein-centric approach to finding new components of existing pathways, complexes, and processes. We apply state-of-the-art methods to identify binary protein-protein interactions (PPIs) for Drosophila melanogaster. Four all-by-all yeast two-hybrid (Y2H) screens of > 10,000 Drosophila proteins result in the 'FlyBi' dataset of 8723 PPIs among 2939 proteins.
View Article and Find Full Text PDFLong read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g.
View Article and Find Full Text PDFUnderstanding the mechanisms of coronavirus disease 2019 (COVID-19) disease severity to efficiently design therapies for emerging virus variants remains an urgent challenge of the ongoing pandemic. Infection and immune reactions are mediated by direct contacts between viral molecules and the host proteome, and the vast majority of these virus-host contacts (the 'contactome') have not been identified. Here, we present a systematic contactome map of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with the human host encompassing more than 200 binary virus-host and intraviral protein-protein interactions.
View Article and Find Full Text PDFGlobal insights into cellular organization and genome function require comprehensive understanding of the interactome networks that mediate genotype-phenotype relationships. Here we present a human 'all-by-all' reference interactome map of human binary protein interactions, or 'HuRI'. With approximately 53,000 protein-protein interactions, HuRI has approximately four times as many such interactions as there are high-quality curated interactions from small-scale studies.
View Article and Find Full Text PDFBackground: For the majority of rare clinical missense variants, pathogenicity status cannot currently be classified. Classical homocystinuria, characterized by elevated homocysteine in plasma and urine, is caused by variants in the cystathionine beta-synthase (CBS) gene, most of which are rare. With early detection, existing therapies are highly effective.
View Article and Find Full Text PDFMany traits are complex, depending non-additively on variant combinations. Even in model systems, such as the yeast S. cerevisiae, carrying out the high-order variant-combination testing needed to dissect complex traits remains a daunting challenge.
View Article and Find Full Text PDFSummary: The promise of personalized genomic medicine depends on our ability to assess the functional impact of rare sequence variation. Multiplexed assays can experimentally measure the functional impact of missense variants on a massive scale. However, even after such assays, many missense variants remain poorly measured.
View Article and Find Full Text PDFCondition-dependent genetic interactions can reveal functional relationships between genes that are not evident under standard culture conditions. State-of-the-art yeast genetic interaction mapping, which relies on robotic manipulation of arrays of double-mutant strains, does not scale readily to multi-condition studies. Here, we describe barcode fusion genetics to map genetic interactions (BFG-GI), by which double-mutant strains generated via "party" mating can also be monitored for growth to detect genetic interactions.
View Article and Find Full Text PDFAlthough we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin).
View Article and Find Full Text PDFThe exponential growth of genomic variants uncovered by next-generation sequencing necessitates efficient and accurate computational analyses to predict their functional effects. A number of computational methods have been developed for the task, but few unbiased comparisons of their performance are available. To fill the gap, The Critical Assessment of Genome Interpretation (CAGI) comprehensively assesses phenotypic predictions on newly collected experimental datasets.
View Article and Find Full Text PDFGenetic suppression occurs when the phenotypic defects caused by a mutation in a particular gene are rescued by a mutation in a second gene. To explore the principles of genetic suppression, we examined both literature-curated and unbiased experimental data, involving systematic genetic mapping and whole-genome sequencing, to generate a large-scale suppression network among yeast genes. Most suppression pairs identified novel relationships among functionally related genes, providing new insights into the functional wiring diagram of the cell.
View Article and Find Full Text PDFHigh-throughput binary protein interaction mapping is continuing to extend our understanding of cellular function and disease mechanisms. However, we remain one or two orders of magnitude away from a complete interaction map for humans and other major model organisms. Completion will require screening at substantially larger scales with many complementary assays, requiring further efficiency gains in proteome-scale interaction mapping.
View Article and Find Full Text PDFTranscription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs.
View Article and Find Full Text PDFGenomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families.
View Article and Find Full Text PDFBackground: Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
June 2011
H-NS and Lsr2 are nucleoid-associated proteins from Gram-negative bacteria and Mycobacteria, respectively, that play an important role in the silencing of horizontally acquired foreign DNA that is more AT-rich than the resident genome. Despite the fact that Lsr2 and H-NS proteins are dissimilar in sequence and structure, they serve apparently similar functions and can functionally complement one another. The mechanism by which these xenogeneic silencers selectively target AT-rich DNA has been enigmatic.
View Article and Find Full Text PDFC2H2 zinc fingers (C2H2-ZFs) are the most prevalent type of vertebrate DNA-binding domain, and typically appear in tandem arrays (ZFAs), with sequential C2H2-ZFs each contacting three (or more) sequential bases. C2H2-ZFs can be assembled in a modular fashion, providing one explanation for their remarkable evolutionary success. Given a set of modules with defined three-base specificities, modular assembly also presents a way to construct artificial proteins with specific DNA-binding preferences.
View Article and Find Full Text PDFLong DNA palindromes are implicated in chromosomal rearrangement, but their roles in the underlying molecular events remain a matter of conjecture. One notion is that palindromes induce DNA breaks after assuming a cruciform structure, the four-way DNA junction providing a target for cleavage by Holliday junction (HJ)-specific enzymes. Though compelling, few components of the "cruciform resolution" proposal are established.
View Article and Find Full Text PDF