Publications by Aleksey V Zimin | LitMetric

Publications by authors named "Aleksey V Zimin"

Page 1 of 2

PSAURON: a tool for assessing protein annotation across a broad range of species.

Markus J Sommer Aleksey V Zimin Steven L Salzberg

NAR Genom Bioinform

March 2025

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.

View Article and Find Full Text PDF

PSAURON: a tool for assessing protein annotation across a broad range of species.

Markus J Sommer Aleksey V Zimin Steven L Salzberg

bioRxiv

May 2024

Unlabelled: Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein coding region.

View Article and Find Full Text PDF

A genome sequence for the threatened whitebark pine.

David B Neale Aleksey V Zimin Amy Meltzer Akriti Bhattarai Maurice Amee

G3 (Bethesda)

May 2024

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality.

View Article and Find Full Text PDF

A Genome Sequence for the Threatened Whitebark Pine.

David B Neale Aleksey V Zimin Amy Meltzer Akriti Bhattarai Maurice Amee

bioRxiv

November 2023

Article Synopsis

Whitebark pine (WBP) is under threat from disease like white pine blister rust, pests, wildfires, and climate change, leading to severe mortality across its range in the Western US and Canada.
Genomic technologies have been utilized to effectively identify disease-resistant and climate-adapted seed sources for restoring WBP, including advanced sequencing techniques that produced a detailed genome assembly.
The study identified a significant number of candidate genes for disease resistance, particularly focusing on nucleotide-binding leucine-rich-repeat receptors (NLRs), enhancing the ability to understand and improve WBP’s resilience compared to earlier methods.

View Article and Find Full Text PDF

Impacts of Sex Ratio Meiotic Drive on Genome Structure and Function in a Stalk-Eyed Fly.

Josephine A Reinhardt Richard H Baker Aleksey V Zimin Chloe Ladias Kimberly A Paczolt

Genome Biol Evol

July 2023

Stalk-eyed flies in the genus Teleopsis carry selfish genetic elements that induce sex ratio (SR) meiotic drive and impact the fitness of male and female carriers. Here, we assemble and describe a chromosome-level genome assembly of the stalk-eyed fly, Teleopsis dalmanni, to elucidate patterns of divergence associated with SR. The genome contains tens of thousands of transposable element (TE) insertions and hundreds of transcriptionally and insertionally active TE families.

View Article and Find Full Text PDF

A draft human pangenome reference.

Wen-Wei Liao Mobin Asri Jana Ebler Daniel Doerr Marina Haukness Aleksey V Zimin

Nature

May 2023

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.

View Article and Find Full Text PDF

JASPER: A fast genome polishing tool that improves accuracy of genome assemblies.

Alina Guo Steven L Salzberg Aleksey V Zimin

PLoS Comput Biol

March 2023

Advances in long-read sequencing technologies have dramatically improved the contiguity and completeness of genome assemblies. Using the latest nanopore-based sequencers, we can generate enough data for the assembly of a human genome from a single flow cell. With the long-read data from these sequences, we can now routinely produce de novo genome assemblies in which half or more of a genome is contained in megabase-scale contigs.

View Article and Find Full Text PDF

Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus.

Jeremiah Miller Aleksey V Zimin Andrew Gordus

Gigascience

December 2022

Article Synopsis

The study presents a detailed genome assembly for the orb-weaving spider Uloborus diversus, filling a gap in the genetic research of orb-weaving families that has existed for over 200 million years.
This research provides evidence of an ancient genome duplication in arachnids and highlights complete spidroin gene sequences, which are essential for spider silk structure.
The findings also identify the sex chromosomes and potential sex-determining genes, making this genome a key resource for studying the evolution of orb-weaving and related genetic traits in spiders.

View Article and Find Full Text PDF

The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual.

Kuan-Hao Chao Aleksey V Zimin Mihaela Pertea Steven L Salzberg

G3 (Bethesda)

March 2023

We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding.

View Article and Find Full Text PDF

Semi-automated assembly of high-quality diploid human reference genomes.

Erich D Jarvis Giulio Formenti Arang Rhie Andrea Guarracino Chentao Yang Aleksey V Zimin

Nature

November 2022

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome.

View Article and Find Full Text PDF

The evolution of synaptic and cognitive capacity: Insights from the nervous system transcriptome of .

Joshua Orvis Caroline B Albertin Pragya Shrestha Shuangshuang Chen Melanie Zheng Aleksey V Zimin

Proc Natl Acad Sci U S A

July 2022

The gastropod mollusk is an important model for cellular and molecular neurobiological studies, particularly for investigations of molecular mechanisms of learning and memory. We developed an optimized assembly pipeline to generate an improved nervous system transcriptome. This improved transcriptome enabled us to explore the evolution of cognitive capacity at the molecular level.

View Article and Find Full Text PDF

High-quality genome and methylomes illustrate features underlying evolutionary success of oaks.

Victoria L Sork Shawn J Cokus Sorel T Fitz-Gibbon Aleksey V Zimin Daniela Puiu

Nat Commun

April 2022

Article Synopsis

The genus Quercus began diversifying about 55 million years ago, resulting in around 450 species, including the California oak Quercus lobata, which has a high-quality genome assembly that showcases its evolutionary advantages.
Analysis of the oak's genome revealed a large effective population size despite a historical decline, with extensive gene duplications contributing to its genetic and phenotypic diversity.
Unique patterns of DNA methylation connected to transposable elements indicate a presence of heterochromatin similar to grasses, supporting the idea that these genetic features enhance adaptability to environmental changes.

View Article and Find Full Text PDF

The SAMBA tool uses long reads to improve the contiguity of genome assemblies.

Aleksey V Zimin Steven L Salzberg

PLoS Comput Biol

February 2022

Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective.

View Article and Find Full Text PDF

Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin.

David B Neale Aleksey V Zimin Sumaira Zaman Alison D Scott Bikash Shrestha

G3 (Bethesda)

January 2022

Sequencing, assembly, and annotation of the 26.5 Gbp hexaploid genome of coast redwood (Sequoia sempervirens) was completed leading toward discovery of genes related to climate adaptation and investigation of the origin of the hexaploid genome. Deep-coverage short-read Illumina sequencing data from haploid tissue from a single seed were combined with long-read Oxford Nanopore Technologies sequencing data from diploid needle tissue to create an initial assembly, which was then scaffolded using proximity ligation data to produce a highly contiguous final assembly, SESE 2.

View Article and Find Full Text PDF

Metagenomic classification with KrakenUniq on low-memory computers.

Christopher Pockrandt Aleksey V Zimin Steven L Salzberg

J Open Source Softw

December 2022

Unlabelled: Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all from all genomes that the users want to be able to detect, where = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB.

View Article and Find Full Text PDF

A reference-quality, fully annotated genome from a Puerto Rican individual.

Aleksey V Zimin Alaina Shumate Ida Shinder Jakob Heinz Daniela Puiu

Genetics

February 2022

Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project.

View Article and Find Full Text PDF

The American lobster genome reveals insights on longevity, neural, and immune adaptations.

Jennifer M Polinski Aleksey V Zimin K Fraser Clark Andrea B Kohn Norah Sadowski

Sci Adv

June 2021

The American lobster, , is integral to marine ecosystems and supports an important commercial fishery. This iconic species also serves as a valuable model for deciphering neural networks controlling rhythmic motor patterns and olfaction. Here, we report a high-quality draft assembly of the genome with 25,284 predicted gene models.

View Article and Find Full Text PDF

A Reference Genome Sequence for Giant Sequoia.

Alison D Scott Aleksey V Zimin Daniela Puiu Rachael Workman Monica Britton

G3 (Bethesda)

November 2020

The giant sequoia () of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains.

View Article and Find Full Text PDF

Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies.

Michael Alonge Alaina Shumate Daniela Puiu Aleksey V Zimin Steven L Salzberg

Genetics

October 2020

Bread wheat ( is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.

View Article and Find Full Text PDF

The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies.

Aleksey V Zimin Steven L Salzberg

PLoS Comput Biol

June 2020

The introduction of third-generation DNA sequencing technologies in recent years has allowed scientists to generate dramatically longer sequence reads, which when used in whole-genome sequencing projects have yielded better repeat resolution and far more contiguous genome assemblies. While the promise of better contiguity has held true, the relatively high error rate of long reads, averaging 8-15%, has made it challenging to generate a highly accurate final sequence. Current long-read sequencing technologies display a tendency toward systematic errors, in particular in homopolymer regions, which present additional challenges.

View Article and Find Full Text PDF

Assembly and annotation of an Ashkenazi human reference genome.

Alaina Shumate Aleksey V Zimin Rachel M Sherman Daniela Puiu Justin M Wagner

Genome Biol

June 2020

Background: Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases.

View Article and Find Full Text PDF

High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome.

Annarita Marrano Monica Britton Paulo A Zaini Aleksey V Zimin Rachael E Workman

Gigascience

May 2020

Background: The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes.

View Article and Find Full Text PDF

Soybean aphid biotype 1 genome: Insights into the invasive biology and adaptive evolution of a major agricultural pest.

Rosanna Giordano Ravi Kiran Donthu Aleksey V Zimin Irene Consuelo Julca Chavez Toni Gabaldon

Insect Biochem Mol Biol

May 2020

Article Synopsis

The soybean aphid (Aphis glycines) is a significant pest affecting soybeans, an important global crop; researchers sequenced the genome of a North American biotype to understand its biology.
About 20.4% of its proteins are duplicated, with many related to apoptosis, suggesting adaptations to environmental stressors, with some duplicated genes also found in other aphid species.
Population studies reveal that North American aphids are genetically similar to those from China and South Korea, with reduced diversity over time, and specific populations exhibit a greater ability to infest resistant soybean varieties.

View Article and Find Full Text PDF

Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing.

Andrew C Read Matthew J Moscou Aleksey V Zimin Geo Pertea Rachel S Meyer

PLoS Genet

January 2020

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition.

View Article and Find Full Text PDF

Transcriptome assembly from long-read RNA-seq alignments with StringTie2.

Sam Kovaka Aleksey V Zimin Geo M Pertea Roham Razaghi Steven L Salzberg

Genome Biol

December 2019

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads.

View Article and Find Full Text PDF