Publications by Benedict Paten | LitMetric

Publications by authors named "Benedict Paten"

Page 1 of 7

GENCODE 2025: reference gene annotation for human and mouse.

Jonathan M Mudge Sílvia Carbonell-Sala Mark Diekhans Jose Gonzalez Martinez Toby Hunt Benedict Paten

Nucleic Acids Res

November 2024

GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result.

View Article and Find Full Text PDF

GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing.

Gazaldeep Kaur Tamara Perteghella Sílvia Carbonell-Sala Jose Gonzalez-Martinez Toby Hunt Benedict Paten

bioRxiv

October 2024

Article Synopsis

- Accurate gene annotations are essential for interpreting how genomes function, and the GENCODE consortium has spent twenty years creating reference annotations for human and mouse genomes, serving as a vital resource for researchers globally.
- Previous annotations of long non-coding RNAs (lncRNAs) were incomplete and poorly organized, hindering research, prompting GENCODE to launch a comprehensive effort that resulted in adding nearly 18,000 novel human genes and over 22,000 novel mouse genes, significantly increasing the catalog of transcripts.
- The new annotations not only show evolutionary patterns and link to genetic variants associated with traits but also improve understanding of previously unclear genomic functions, greatly advancing research into both human and mouse genetic diseases.

View Article and Find Full Text PDF

Efficient indexing and querying of annotations in a pangenome graph.

Adam M Novak Dickson Chung Glenn Hickey Sarah Djebali Toshiyuki T Yokoyama Benedict Paten

bioRxiv

October 2024

The current reference genome is the backbone of diverse and rich annotations. Simple text formats, like VCF or BED, have been widely adopted and helped the critical exchange of genomic information. There is a dire need for tools and formats enabling pangenomic annotation to facilitate such enrichment of pangenomic references.

View Article and Find Full Text PDF

Complex genetic variation in nearly complete human genomes.

Glennis A Logsdon Peter Ebert Peter A Audano Mark Loftus David Porubsky Benedict Paten

bioRxiv

September 2024

Article Synopsis

* It achieves a high level of completeness, closing 92% of previous assembly gaps and fully assembling complex regions, including 1,852 complex structural variants and 1,246 human centromeres.
* The findings lead to significant improvements in genotyping accuracy and enable the detection of over 26,000 structural variants per sample, enhancing the potential for future disease association research.

View Article and Find Full Text PDF

Highly accurate assembly polishing with DeepPolisher.

Mira Mastoras Mobin Asri Lucas Brambrink Prajna Hebbar Alexey Kolesnikov Benedict Paten

bioRxiv

September 2024

Article Synopsis

Accurate genome assemblies are crucial for biological research, but they often have errors due to the technologies used, necessitating polishing steps to correct these mistakes.
The new model, DeepPolisher, utilizes Pacbio HiFi read alignments and a method called PHARAOH to improve sequences by accurately addressing haplotypes and correcting errors in areas previously thought to be homozygous.
Testing DeepPolisher on 180 assemblies from the Human Pangenome Reference Consortium showed a significant reduction in assembly errors, achieving an average improvement of 54% in error reduction with a predicted Quality Value increase of 3.4.

View Article and Find Full Text PDF

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Jennifer H McDaniel Vaidehi Patel Nathan D Olson Hua-Jun He Zhiyong He Benedict Paten

bioRxiv

October 2024

Article Synopsis

The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.

View Article and Find Full Text PDF

A phenome-wide association study of methylated GC-rich repeats identifies a GCC repeat expansion in AFF3 associated with intellectual disability.

Bharati Jadhav Paras Garg Joke J F A van Vugt Kristina Ibanez Delia Gagliardi Benedict Paten

Nat Genet

November 2024

GC-rich tandem repeat expansions (TREs) are often associated with DNA methylation, gene silencing and folate-sensitive fragile sites, and underlie several congenital and late-onset disorders. Through a combination of DNA-methylation profiling and tandem repeat genotyping, we identified 24 methylated TREs and investigated their effects on human traits using phenome-wide association studies in 168,641 individuals from the UK Biobank, identifying 156 significant TRE-trait associations involving 17 different TREs. Of these, a GCC expansion in the promoter of AFF3 was associated with a 2.

View Article and Find Full Text PDF

Personalized pangenome references.

Jouni Sirén Parsa Eskandar Matteo Tommaso Ungaro Glenn Hickey Jordan M Eizenga Benedict Paten

Nat Methods

November 2024

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants.

View Article and Find Full Text PDF

DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies.

Jimin Park Daniel E Cook Pi-Chuan Chang Alexey Kolesnikov Lucas Brambrink Benedict Paten

bioRxiv

August 2024

Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples.

View Article and Find Full Text PDF

Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection.

Shloka Negi Sarah L Stenton Seth I Berger Brandy McNulty Ivo Violich Benedict Paten

medRxiv

August 2024

Article Synopsis

* Long-read sequencing (LRS) offers a promising solution by providing more comprehensive data, including better long-range mapping and methylation profiling, which can help identify variants not detectable by SRS.
* In a study involving 98 samples, LRS successfully identified additional rare variants in 11 cases, enhancing diagnostic accuracy for rare monogenic diseases and suggesting its future importance in clinical genomics.

View Article and Find Full Text PDF

Complete sequencing of ape genomes.

DongAhn Yoo Arang Rhie Prajna Hebbar Francesca Antonacci Glennis A Logsdon Benedict Paten

bioRxiv

October 2024

Article Synopsis

The study presents detailed genomes of six ape species, achieving high accuracy and complete sequencing of all their chromosomes.
It addresses complex genomic regions, leading to enhanced understanding of evolutionary relationships among these species.
The findings will serve as a crucial resource for future research on human evolution and our closest ape relatives.

View Article and Find Full Text PDF

Local read haplotagging enables accurate long-read small variant calling.

Alexey Kolesnikov Daniel Cook Maria Nattestad Lucas Brambrink Brandy McNulty Benedict Paten

Nat Commun

July 2024

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy.

View Article and Find Full Text PDF

SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis.

Jesus Gonzalez-Ferrer Julian Lehrer Ash O'Farrell Benedict Paten Mircea Teodorescu

Cell Genom

June 2024

Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species.

View Article and Find Full Text PDF

The complete sequence and comparative analysis of ape sex chromosomes.

Kateryna D Makova Brandon D Pickett Robert S Harris Gabrielle A Hartley Monika Cechova Benedict Paten

Nature

June 2024

Article Synopsis

Apes have two sex chromosomes: the essential Y chromosome for male reproduction and the X chromosome necessary for both reproduction and cognition, with differences in mating patterns affecting their function.
Studying these chromosomes is challenging due to their repetitive structures, but researchers created gapless assemblies for five great apes and one lesser ape to explore their evolutionary complexities.
The Y chromosomes are highly variable and undergo significant changes compared to the more stable X chromosomes, and this research can provide insights into human evolution and aid in the conservation of endangered ape species.

View Article and Find Full Text PDF

Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References.

Dylan J Taylor Jordan M Eizenga Qiuhui Li Arun Das Katharine M Jenike Benedict Paten

Annu Rev Genomics Hum Genet

August 2024

Article Synopsis

The Human Genome Project laid the groundwork for genetic research but initially struggled with representing human genetic diversity.
Recent breakthroughs, namely complete gap-free genomes from the Telomere-to-Telomere Consortium and high-quality pangenomes from the Human Pangenome Reference Consortium, have addressed these issues.
These advancements, driven by improved DNA sequencing technology, not only provide clearer genome mapping but also enhance our understanding of genetic diversity, leading to better applications in precision medicine and human biology.

View Article and Find Full Text PDF

Phased nanopore assembly with Shasta and modular graph phasing with GFAse.

Ryan Lorig-Roach Melissa Meredith Jean Monlong Miten Jain Hugh E Olsen Benedict Paten

Genome Res

April 2024

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse.

View Article and Find Full Text PDF

Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads.

Ayse Keskus Asher Bryant Tanveer Ahmad Byunggil Yoo Sergey Aganezov Benedict Paten

medRxiv

March 2024

Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy.

View Article and Find Full Text PDF

A region of suppressed recombination misleads neoavian phylogenomics.

Siavash Mirarab Iker Rivas-González Shaohong Feng Josefin Stiller Qi Fang Benedict Paten

Proc Natl Acad Sci U S A

April 2024

Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome.

View Article and Find Full Text PDF

Assessing methylation detection for primary human tissue using Nanopore sequencing.

Rylee Genner Stuart Akeson Melissa Meredith Pilar Alvarez Jerez Laksh Malik Benedict Paten

bioRxiv

March 2024

DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput.

View Article and Find Full Text PDF

Structurally divergent and recurrently mutated regions of primate genomes.

Yafei Mao William T Harvey David Porubsky Katherine M Munson Kendra Hoekzema Benedict Paten

Cell

March 2024

Article Synopsis

* We discovered over 1.3 million lineage-specific structural variants (SVs) that impact thousands of protein-coding genes and regulatory elements, revealing significant genomic differences among primates, especially compared to humans.
* Our research identified 1,607 regions with structural variations that are hotspots for gene loss and creation, indicating areas in the genome subject to rapid evolution and natural selection across primate species.

View Article and Find Full Text PDF

Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements.

Morgan E Wirthlin Tobias A Schmid Julie E Elie Xiaomeng Zhang Amanda Kowalczyk Benedict Paten

Science

March 2024

Article Synopsis

The study investigates the genetic and brain features linked to vocal learning in mammals by comparing data from the Egyptian fruit bat and 215 other placental mammals.* -
Researchers found that certain proteins evolve more slowly in vocal learners and identified a specific brain region responsible for vocal motor control in the Egyptian fruit bat.* -
Using machine learning, they uncovered 50 regulatory elements that are associated with vocal learning, suggesting that losses in these elements played a role in the evolution of vocal learning in mammals.*

View Article and Find Full Text PDF

Personalized Pangenome References.

Jouni Sirén Parsa Eskandar Matteo Tommaso Ungaro Glenn Hickey Jordan M Eizenga Benedict Paten

bioRxiv

December 2023

Pangenomes, by including genetic diversity, should reduce reference bias by better representing new samples compared to them. Yet when comparing a new sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with using allele frequency filters.

View Article and Find Full Text PDF

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.

Kateryna D Makova Brandon D Pickett Robert S Harris Gabrielle A Hartley Monika Cechova Benedict Paten

bioRxiv

December 2023

Article Synopsis

Apes have two main sex chromosomes, X and Y, where Y is crucial for male reproduction and its deletions can lead to infertility, while X is important for both reproduction and brain function.
Recent advancements in genomic techniques helped researchers create complete structures of the X and Y chromosomes for multiple great ape species, allowing them to explore their evolutionary complexities.
Findings indicate that Y chromosomes are highly variable and undergo rapid changes due to unique genetic regions and transposable elements, while X chromosomes are more stable, highlighting differing evolutionary paths among great ape species.

View Article and Find Full Text PDF

Identification of constrained sequence elements across 239 primate genomes.

Lukas F K Kuderna Jacob C Ulirsch Sabrina Rashid Mohamed Ameen Laksshman Sundaram Benedict Paten

Nature

January 2024

Article Synopsis

Noncoding DNA helps scientists understand how genes work and how they relate to diseases in humans.
Researchers studied the DNA of many primates to find specific regulatory parts that are important for gene regulation.
They discovered a lot of these regulatory elements in humans that are different from those in other mammals, which can help explain human traits and health issues.

View Article and Find Full Text PDF

The UCSC Genome Browser database: 2024 update.

Brian J Raney Galt P Barber Anna Benet-Pagès Jonathan Casper Hiram Clawson Benedict Paten

Nucleic Acids Res

January 2024

The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide.

View Article and Find Full Text PDF