Publications by authors named "Zhigui Bao"

Motivation: As genome graphs are powerful data structures for representing the genetic diversity within populations, they can help identify genomic variations that traditional linear references miss, but their complexity and size makes the analysis of genome graphs challenging. We sought to develop a genome graph analysis tool that helps these analyses to become more accessible by addressing the limitations of existing tools. Specifically, we improve scalability and user-friendliness, and we provide many new statistics tailored to variation graphs for graph evaluation, including sample-specific features.

View Article and Find Full Text PDF

The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy.

View Article and Find Full Text PDF

Jujube (Ziziphus jujuba Mill.), belonging to the Rhamnaceae family, is gaining increasing prominence as a perennial fruit crop with significant economic and medicinal values. Here, we conduct de novo assembly of four reference-grade genomes, encompassing one wild and three cultivated jujube accessions.

View Article and Find Full Text PDF

Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.

View Article and Find Full Text PDF

Background: Telomeric repeat arrays at the ends of chromosomes are highly dynamic in composition, but their repetitive nature and technological limitations have made it difficult to assess their true variation in genome diversity surveys.

Results: We have comprehensively characterized the sequence variation immediately adjacent to the canonical telomeric repeat arrays at the very ends of chromosomes in 74 genetically diverse Arabidopsis thaliana accessions. We first describe several types of distinct telomeric repeat units and then identify evolutionary processes such as local homogenization and higher-order repeat formation that shape diversity of chromosome ends.

View Article and Find Full Text PDF

Assembly of complete genomes can reveal functional genetic elements missing from draft sequences. Here we present the near-complete telomere-to-telomere and contiguous genome of the cotton species Gossypium raimondii. Our assembly identified gaps and misoriented or misassembled regions in previous assemblies and produced 13 centromeres, with 25 chromosomal ends having telomeres.

View Article and Find Full Text PDF

The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy.

View Article and Find Full Text PDF

Hybrid potato breeding will transform the crop from a clonally propagated tetraploid to a seed-reproducing diploid. Historical accumulation of deleterious mutations in potato genomes has hindered the development of elite inbred lines and hybrids. Utilizing a whole-genome phylogeny of 92 Solanaceae and its sister clade species, we employ an evolutionary strategy to identify deleterious mutations.

View Article and Find Full Text PDF

Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder (PGGB), a pipeline for constructing pangenome graphs without bias or exclusion. PGGB uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.

View Article and Find Full Text PDF

Teosinte, the wild ancestor of maize (Zea mays subsp. mays), has three times the seed protein content of most modern inbreds and hybrids, but the mechanisms that are responsible for this trait are unknown. Here we use trio binning to create a contiguous haplotype DNA sequence of a teosinte (Zea mays subsp.

View Article and Find Full Text PDF

Chinese sorghum (S. bicolor) has been a historically critical ingredient for brewing famous distilled liquors ever since Yuan Dynasty (749 ∼ 652 years BP). Incomplete understanding of the population genetics and domestication history limits its broad applications, especially that the lack of genetics knowledge underlying liquor-brewing properties makes it difficult to establish scientific standards for sorghum breeding.

View Article and Find Full Text PDF
Article Synopsis
  • Potatoes are the most popular non-cereal food crop, but their complex genomes make genetic analysis difficult, as most varieties are autotetraploids with diverse genetic makeups.
  • Researchers used advanced sequencing techniques to create an in-depth, chromosome-scale genome assembly for a specific potato cultivar called Cooperation-88 (C88), revealing significant genetic and expression variations within the tetraploid genome.
  • The study also discovered unique evolutionary patterns and genetic features in the C88 genome, which could help improve breeding strategies for potatoes by understanding how different genetic variants contribute to traits like hybrid vigor (heterosis).
View Article and Find Full Text PDF

Potato (Solanum tuberosum L.) is the world's most important non-cereal food crop, and the vast majority of commercially grown cultivars are highly heterozygous tetraploids. Advances in diploid hybrid breeding based on true seeds have the potential to revolutionize future potato breeding and production.

View Article and Find Full Text PDF

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies.

View Article and Find Full Text PDF

Ramie is an important fibre-producing crop in China; however, the genetic basis of its agronomic traits remains poorly understood. We produced a comprehensive map of genomic variation in ramie based on resequencing of 301 landraces and cultivars. Genetic analysis produced 129 signals significantly associated with six fibre yield-related traits, and several genes were identified as candidate genes for respective traits.

View Article and Find Full Text PDF

Ramie (Boehmeria nivea) is an economically important natural fiber-producing crop that has been cultivated for thousands of years in China; however, the evolution of this crop remains largely unknown. Here, we report a ramie domestication analysis based on genome assembly and resequencing of cultivated and wild accessions. Two chromosome-level genomes representing wild and cultivated ramie were assembled de novo.

View Article and Find Full Text PDF

Background: Castor bean (Ricinus communis L.) is an important oil crop, which belongs to the Euphorbiaceae family. The seed oil of castor bean is currently the only commercial source of ricinoleic acid that can be used for producing about 2000 industrial products.

View Article and Find Full Text PDF

Sarcophaga peregrina is considered to be of great ecological, medical and forensic significance, and has unusual biological characteristics such as an ovoviviparous reproductive pattern and adaptation to feed on carrion. The availability of a high-quality genome will help to further reveal the mechanisms underlying these charcateristics. Here we present a de novo-assembled genome at chromosome scale for S.

View Article and Find Full Text PDF