Background: Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying.

Results: To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity.

Conclusion: As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome.

Availability: http://140.116.235.124/~tliu/omacc.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029551PMC
http://dx.doi.org/10.1186/1752-0509-7-S6-S7DOI Listing

Publication Analysis

Top Keywords

gaps contigs
8
contig sequences
8
optical map
8
contigs
6
genome
5
omacc optical-map-assisted
4
contig
4
optical-map-assisted contig
4
contig connector
4
connector improving
4

Similar Publications

A nearly telomere-to-telomere diploid genome assembly of Firmiana kwangsiensis, a threatened species in China.

Sci Data

December 2024

Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.

Firmiana kwangsiensis is a tree species of high ornamental value. The species is critically endangered in the wild, and is listed as a first-class national protected wild plant in China, and a Plant Species with Extremely Small Populations in need of urgent protection. We have assembled a chromosome-scale, haplotype-resolved genome for F.

View Article and Find Full Text PDF

Background: Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement.

Findings: In this study, we conducted whole-genome sequencing on a Mongolian bull.

View Article and Find Full Text PDF
Article Synopsis
  • - The study focuses on improving callus induction methods to derive a doubled haploid (DH) callus line from poplar anthers, addressing gaps in the previously sequenced genomes.
  • - Using long-read sequencing, researchers successfully assembled a nearly complete genome of 412.13 Mb with only seven gaps, significantly improving the reference genome for poplars by annotating 34,953 protein-coding genes.
  • - This new telomere-to-telomere (T2T) genome assembly enhances understanding of poplar genetics and evolutionary studies, particularly through the identification of centromeric regions and their high-order repeats.
View Article and Find Full Text PDF

Telomere-to-telomere assembly by preserving contained reads.

Genome Res

November 2024

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India

Automated telomere-to-telomere (T2T) de novo assembly of diploid and polyploid genomes remains a formidable task. A string graph is a commonly used assembly graph representation in the assembly algorithms. The string graph formulation employs graph simplification heuristics, which drastically reduce the count of vertices and edges.

View Article and Find Full Text PDF

Chromosome-level genome assembly of the cashmere goat.

Sci Data

October 2024

College of Animal Science, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia Autonomous Region, 010018, China.

Article Synopsis
  • The study focuses on assembling the nearly complete genome of a cashmere goat, a valuable source of cashmere, meat, and milk, which has not yet been characterized.
  • Using advanced sequencing technologies, the assembled genome measures 2.76 Gb with high quality, containing over 22,480 protein-coding genes and minimal gaps.
  • This research provides a significant reference resource for understanding the genetics of cashmere goats and improving breeding for desirable traits in economically important goat breeds.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!