Publications by Mark Depristo | LitMetric

Publications by authors named "Mark Depristo"

Page 1 of 3

Using deep learning to annotate the protein universe.

Maxwell L Bileschi David Belanger Drew H Bryant Theo Sanderson Brandon Carter Mark A DePristo

Nat Biotechnol

June 2022

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam.

View Article and Find Full Text PDF

RNA profiles reveal signatures of future health and disease in pregnancy.

Morten Rasmussen Mitsu Reddy Rory Nolan Joan Camunas-Soler Arkady Khodursky Mark A DePristo

Nature

January 2022

Maternal morbidity and mortality continue to rise, and pre-eclampsia is a major driver of this burden. Yet the ability to assess underlying pathophysiology before clinical presentation to enable identification of pregnancies at risk remains elusive. Here we demonstrate the ability of plasma cell-free RNA (cfRNA) to reveal patterns of normal pregnancy progression and determine the risk of developing pre-eclampsia months before clinical presentation.

View Article and Find Full Text PDF

Challenges of Accuracy in Germline Clinical Sequencing Data.

Ryan Poplin Justin M Zook Mark DePristo

JAMA

July 2021

View Article and Find Full Text PDF

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Aaron M Wenger Paul Peluso William J Rowell Pi-Chuan Chang Richard J Hall Mark A DePristo

Nat Biotechnol

October 2019

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.

View Article and Find Full Text PDF

GenomeWarp: an alignment-based variant coordinate transformation.

Cory Y McLean Yeongwoo Hwang Ryan Poplin Mark A DePristo

Bioinformatics

November 2019

Summary: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation.

View Article and Find Full Text PDF

CrowdVariant: a crowdsourcing approach to classify copy number variants.

Peyton Greenside Justin Zook Marc Salit Madeleine Cule Ryan Poplin Mark DePristo

Pac Symp Biocomput

August 2019

Copy number variants (CNVs) are an important type of genetic variation that play a causal role in many diseases. The ability to identify high quality CNVs is of substantial clinical relevance. However, CNVs are notoriously difficult to identify accurately from array-based methods and next-generation sequencing (NGS) data, particularly for small (< 10kbp) CNVs.

View Article and Find Full Text PDF

A guide to deep learning in healthcare.

Andre Esteva Alexandre Robicquet Bharath Ramsundar Volodymyr Kuleshov Mark DePristo

Nat Med

January 2019

Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data.

View Article and Find Full Text PDF

A universal SNP and small-indel variant caller using deep neural networks.

Ryan Poplin Pi-Chuan Chang David Alexander Scott Schwartz Thomas Colthurst Mark A DePristo

Nat Biotechnol

November 2018

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools.

View Article and Find Full Text PDF

Deep learning of genomic variation and regulatory network data.

Amalio Telenti Christoph Lippert Pi-Chuan Chang Mark DePristo

Hum Mol Genet

May 2018

The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g.

View Article and Find Full Text PDF

Erratum: Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

Jason Flannick Christian Fuchsberger Anubha Mahajan Tanya M Teslovich Vineeta Agarwala Mark DePristo

Sci Data

January 2018

This corrects the article DOI: 10.1038/sdata.2017.

View Article and Find Full Text PDF

Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees.

Goo Jun Alisa Manning Marcio Almeida Matthew Zawistowski Andrew R Wood Mark DePristo

Proc Natl Acad Sci U S A

January 2018

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association.

View Article and Find Full Text PDF

Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

Jason Flannick Christian Fuchsberger Anubha Mahajan Tanya M Teslovich Vineeta Agarwala Mark DePristo

Sci Data

December 2017

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.

View Article and Find Full Text PDF

A Low-Frequency Inactivating Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk.

Alisa Manning Heather M Highland Jessica Gasser Xueling Sim Taru Tukiainen Mark DePristo

Diabetes

July 2017

Article Synopsis

Researchers analyzed genetic data from over 39,000 people to find new associations linked to glycemic traits and type 2 diabetes risk.
They discovered a specific variant (p.Pro50Thr) that increases fasting plasma insulin levels by 12%, particularly in individuals of Finnish descent.
This variant is associated with lower insulin sensitivity and a slightly higher risk of developing type 2 diabetes, highlighting its functional impact in glucose regulation.

View Article and Find Full Text PDF

A framework for the detection of de novo mutations in family-based sequencing data.

Laurent C Francioli Mircea Cretu-Stancu Kiran V Garimella Menachem Fromer Wigard P Kloosterman Mark A DePristo

Eur J Hum Genet

February 2017

Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate.

View Article and Find Full Text PDF

Analysis of protein-coding genetic variation in 60,706 humans.

Monkol Lek Konrad J Karczewski Eric V Minikel Kaitlin E Samocha Eric Banks Mark DePristo

Nature

August 2016

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence.

View Article and Find Full Text PDF

The genetic architecture of type 2 diabetes.

Christian Fuchsberger Jason Flannick Tanya M Teslovich Anubha Mahajan Vineeta Agarwala Mark DePristo

Nature

August 2016

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups.

View Article and Find Full Text PDF

Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.

Ron Do Nathan O Stitziel Hong-Hee Won Anders Berg Jørgensen Stefano Duga Mark A DePristo

Nature

February 2015

Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population.

View Article and Find Full Text PDF

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.

Geraldine A Van der Auwera Mauricio O Carneiro Christopher Hartl Ryan Poplin Guillermo Del Angel Mark A DePristo

Curr Protoc Bioinformatics

July 2016

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

View Article and Find Full Text PDF

A framework for the interpretation of de novo mutation in human disease.

Kaitlin E Samocha Elise B Robinson Stephan J Sanders Christine Stevens Aniko Sabo Mark DePristo

Nat Genet

September 2014

Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation.

View Article and Find Full Text PDF

Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences.

Vincenza Colonna Qasim Ayub Yuan Chen Luca Pagani Pierre Luisi Mark A DePristo

Genome Biol

June 2014

Background: Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences.

View Article and Find Full Text PDF

Loss-of-function mutations in APOC3, triglycerides, and coronary disease.

N Engl J Med

July 2014

Background: Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype.

Methods: We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project.

View Article and Find Full Text PDF

A polygenic burden of rare disruptive mutations in schizophrenia.

Shaun M Purcell Jennifer L Moran Menachem Fromer Douglas Ruderfer Nadia Solovieff Mark DePristo

Nature

February 2014

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies.

View Article and Find Full Text PDF

Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls.

Li Liu Aniko Sabo Benjamin M Neale Uma Nagaswamy Christine Stevens Mark Depristo

PLoS Genet

April 2013

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers.

View Article and Find Full Text PDF

Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders.

Elaine T Lim Soumya Raychaudhuri Stephan J Sanders Christine Stevens Aniko Sabo Mark dePristo

Neuron

January 2013

To characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤ 5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.

View Article and Find Full Text PDF

An integrated map of genetic variation from 1,092 human genomes.

Nature

November 2012

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.

View Article and Find Full Text PDF