Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6126949 | PMC |
http://dx.doi.org/10.1038/nbt.4227 | DOI Listing |
Sensors (Basel)
January 2025
Department of Computer Science, King AbdulAziz University, Jeddah 21589, Saudi Arabia.
Traffic flow prediction is a pivotal element in Intelligent Transportation Systems (ITSs) that provides significant opportunities for real-world applications. Capturing complex and dynamic spatio-temporal patterns within traffic data remains a significant challenge for traffic flow prediction. Different approaches to effectively modeling complex spatio-temporal correlations within traffic data have been proposed.
View Article and Find Full Text PDFBMC Biol
January 2025
Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
Background: Plant mitochondrial genomes (mitogenomes) exhibit extensive structural variation yet extremely low nucleotide mutation rates, phenomena that remain only partially understood. The genus Gossypium, a globally important source of cotton, offers a wealth of long-read sequencing resources to explore mitogenome and plastome variation and dynamics accompanying the evolutionary divergence of its approximately 50 diploid and allopolyploid species.
Results: Here, we assembled 19 mitogenomes from Gossypium species, representing all genome groups (diploids A through G, K, and the allopolyploids AD) based on a uniformly applied strategy.
Genome Biol
January 2025
College of Agriculture & Biotechnology, Zhejiang University, Hangzhou, 310058, China.
Background: Fruit acidity and color are important quality attributes in peaches. Although there are some exceptions, blood-fleshed peaches typically have a sour taste. However, little is known about the genetic variations linking organic acid and color regulation in peaches.
View Article and Find Full Text PDFNPJ Syst Biol Appl
January 2025
Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China.
Breast cancer prognosis is complicated by tumor heterogeneity. Traditional methods focus on cancer-specific gene signatures, but cross-cancer strategies that provide deeper insights into tumor homogeneity are rarely used. Immunotherapy, particularly immune checkpoint inhibitors, results from variable responses across cancers, offering valuable prognostic insights.
View Article and Find Full Text PDFBioinformatics
January 2025
Department of Computer Science, City University of Hong Kong, Hong Kong, China.
Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!