Publications by authors named "Benedict Paten"

GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result.

View Article and Find Full Text PDF
Article Synopsis
  • - Accurate gene annotations are essential for interpreting how genomes function, and the GENCODE consortium has spent twenty years creating reference annotations for human and mouse genomes, serving as a vital resource for researchers globally.
  • - Previous annotations of long non-coding RNAs (lncRNAs) were incomplete and poorly organized, hindering research, prompting GENCODE to launch a comprehensive effort that resulted in adding nearly 18,000 novel human genes and over 22,000 novel mouse genes, significantly increasing the catalog of transcripts.
  • - The new annotations not only show evolutionary patterns and link to genetic variants associated with traits but also improve understanding of previously unclear genomic functions, greatly advancing research into both human and mouse genetic diseases.
View Article and Find Full Text PDF

The current reference genome is the backbone of diverse and rich annotations. Simple text formats, like VCF or BED, have been widely adopted and helped the critical exchange of genomic information. There is a dire need for tools and formats enabling pangenomic annotation to facilitate such enrichment of pangenomic references.

View Article and Find Full Text PDF
Article Synopsis
  • * It achieves a high level of completeness, closing 92% of previous assembly gaps and fully assembling complex regions, including 1,852 complex structural variants and 1,246 human centromeres.
  • * The findings lead to significant improvements in genotyping accuracy and enable the detection of over 26,000 structural variants per sample, enhancing the potential for future disease association research.
View Article and Find Full Text PDF
Article Synopsis
  • Accurate genome assemblies are crucial for biological research, but they often have errors due to the technologies used, necessitating polishing steps to correct these mistakes.
  • The new model, DeepPolisher, utilizes Pacbio HiFi read alignments and a method called PHARAOH to improve sequences by accurately addressing haplotypes and correcting errors in areas previously thought to be homozygous.
  • Testing DeepPolisher on 180 assemblies from the Human Pangenome Reference Consortium showed a significant reduction in assembly errors, achieving an average improvement of 54% in error reduction with a predicted Quality Value increase of 3.4.
View Article and Find Full Text PDF
Article Synopsis
  • The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
  • They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
  • This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.
View Article and Find Full Text PDF

GC-rich tandem repeat expansions (TREs) are often associated with DNA methylation, gene silencing and folate-sensitive fragile sites, and underlie several congenital and late-onset disorders. Through a combination of DNA-methylation profiling and tandem repeat genotyping, we identified 24 methylated TREs and investigated their effects on human traits using phenome-wide association studies in 168,641 individuals from the UK Biobank, identifying 156 significant TRE-trait associations involving 17 different TREs. Of these, a GCC expansion in the promoter of AFF3 was associated with a 2.

View Article and Find Full Text PDF

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants.

View Article and Find Full Text PDF

Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples.

View Article and Find Full Text PDF
Article Synopsis
  • * Long-read sequencing (LRS) offers a promising solution by providing more comprehensive data, including better long-range mapping and methylation profiling, which can help identify variants not detectable by SRS.
  • * In a study involving 98 samples, LRS successfully identified additional rare variants in 11 cases, enhancing diagnostic accuracy for rare monogenic diseases and suggesting its future importance in clinical genomics.
View Article and Find Full Text PDF
Article Synopsis
  • The study presents detailed genomes of six ape species, achieving high accuracy and complete sequencing of all their chromosomes.
  • It addresses complex genomic regions, leading to enhanced understanding of evolutionary relationships among these species.
  • The findings will serve as a crucial resource for future research on human evolution and our closest ape relatives.
View Article and Find Full Text PDF

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy.

View Article and Find Full Text PDF

Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species.

View Article and Find Full Text PDF
Article Synopsis
  • Apes have two sex chromosomes: the essential Y chromosome for male reproduction and the X chromosome necessary for both reproduction and cognition, with differences in mating patterns affecting their function.
  • Studying these chromosomes is challenging due to their repetitive structures, but researchers created gapless assemblies for five great apes and one lesser ape to explore their evolutionary complexities.
  • The Y chromosomes are highly variable and undergo significant changes compared to the more stable X chromosomes, and this research can provide insights into human evolution and aid in the conservation of endangered ape species.
View Article and Find Full Text PDF
Article Synopsis
  • The Human Genome Project laid the groundwork for genetic research but initially struggled with representing human genetic diversity.
  • Recent breakthroughs, namely complete gap-free genomes from the Telomere-to-Telomere Consortium and high-quality pangenomes from the Human Pangenome Reference Consortium, have addressed these issues.
  • These advancements, driven by improved DNA sequencing technology, not only provide clearer genome mapping but also enhance our understanding of genetic diversity, leading to better applications in precision medicine and human biology.
View Article and Find Full Text PDF

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse.

View Article and Find Full Text PDF

Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy.

View Article and Find Full Text PDF

Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome.

View Article and Find Full Text PDF

DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput.

View Article and Find Full Text PDF
Article Synopsis
  • * We discovered over 1.3 million lineage-specific structural variants (SVs) that impact thousands of protein-coding genes and regulatory elements, revealing significant genomic differences among primates, especially compared to humans.
  • * Our research identified 1,607 regions with structural variations that are hotspots for gene loss and creation, indicating areas in the genome subject to rapid evolution and natural selection across primate species.
View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the genetic and brain features linked to vocal learning in mammals by comparing data from the Egyptian fruit bat and 215 other placental mammals.* -
  • Researchers found that certain proteins evolve more slowly in vocal learners and identified a specific brain region responsible for vocal motor control in the Egyptian fruit bat.* -
  • Using machine learning, they uncovered 50 regulatory elements that are associated with vocal learning, suggesting that losses in these elements played a role in the evolution of vocal learning in mammals.*
View Article and Find Full Text PDF

Pangenomes, by including genetic diversity, should reduce reference bias by better representing new samples compared to them. Yet when comparing a new sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with using allele frequency filters.

View Article and Find Full Text PDF
Article Synopsis
  • Apes have two main sex chromosomes, X and Y, where Y is crucial for male reproduction and its deletions can lead to infertility, while X is important for both reproduction and brain function.
  • Recent advancements in genomic techniques helped researchers create complete structures of the X and Y chromosomes for multiple great ape species, allowing them to explore their evolutionary complexities.
  • Findings indicate that Y chromosomes are highly variable and undergo rapid changes due to unique genetic regions and transposable elements, while X chromosomes are more stable, highlighting differing evolutionary paths among great ape species.
View Article and Find Full Text PDF
Article Synopsis
  • Noncoding DNA helps scientists understand how genes work and how they relate to diseases in humans.
  • Researchers studied the DNA of many primates to find specific regulatory parts that are important for gene regulation.
  • They discovered a lot of these regulatory elements in humans that are different from those in other mammals, which can help explain human traits and health issues.
View Article and Find Full Text PDF

The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide.

View Article and Find Full Text PDF