Background: Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying.
Results: To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity.
Conclusion: As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome.
Availability: http://140.116.235.124/~tliu/omacc.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029551 | PMC |
http://dx.doi.org/10.1186/1752-0509-7-S6-S7 | DOI Listing |
Sci Data
December 2024
Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
Firmiana kwangsiensis is a tree species of high ornamental value. The species is critically endangered in the wild, and is listed as a first-class national protected wild plant in China, and a Plant Species with Extremely Small Populations in need of urgent protection. We have assembled a chromosome-scale, haplotype-resolved genome for F.
View Article and Find Full Text PDFGigascience
January 2024
Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia.
Background: Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement.
Findings: In this study, we conducted whole-genome sequencing on a Mongolian bull.
For Res (Fayettev)
May 2024
State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China.
Genome Res
November 2024
Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India
Automated telomere-to-telomere (T2T) de novo assembly of diploid and polyploid genomes remains a formidable task. A string graph is a commonly used assembly graph representation in the assembly algorithms. The string graph formulation employs graph simplification heuristics, which drastically reduce the count of vertices and edges.
View Article and Find Full Text PDFSci Data
October 2024
College of Animal Science, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia Autonomous Region, 010018, China.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!