Despite only 8% of cattle being found in Europe, European breeds dominate current genetic resources. This adversely impacts cattle research in other important global cattle breeds, especially those from Africa for which genomic resources are particularly limited, despite their disproportionate importance to the continent's economies. To mitigate this issue, we have generated assemblies of African breeds, which have been integrated with genomic data for 294 diverse cattle into a graph genome that incorporates global cattle diversity. We illustrate how this more representative reference assembly contains an extra 116.1 Mb (4.2%) of sequence absent from the current Hereford sequence and consequently inaccessible to current studies. We further demonstrate how using this graph genome increases read mapping rates, reduces allelic biases and improves the agreement of structural variant calling with independent optical mapping data. Consequently, we present an improved, more representative, reference assembly that will improve global cattle research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8854726PMC
http://dx.doi.org/10.1038/s41467-022-28605-0DOI Listing

Publication Analysis

Top Keywords

graph genome
12
global cattle
12
cattle graph
8
representative reference
8
reference assembly
8
cattle
7
genome incorporating
4
global
4
incorporating global
4
global breed
4

Similar Publications

STMGraph: spatial-context-aware of transcriptomes via a dual-remasked dynamic graph attention model.

Brief Bioinform

November 2024

Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou 350002, China.

Spatial transcriptomics (ST) technologies enable dissecting the tissue architecture in spatial context. To perceive the global contextual information of gene expression patterns in tissue, the spatial dependence of cells must be fully considered by integrating both local and non-local features by means of spatial-context-aware. However, the current ST integration algorithm ignores for ST dropouts, which impedes the spatial-aware of ST features, resulting in challenges in the accuracy and robustness of microenvironmental heterogeneity detecting, spatial domain clustering, and batch-effects correction.

View Article and Find Full Text PDF

Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads.

Lebniz Int Proc Inform

August 2024

Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.

Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol.

View Article and Find Full Text PDF
Article Synopsis
  • A network flow is represented by a collection of weighted walks that combine to create the overall flow; this article characterizes the specific walks involved in these flow decompositions.
  • The authors introduce a new algorithm that can efficiently identify and structure all maximal flowtigs, which are key components of flow decompositions in a network.
  • The practical application focuses on metagenomic assembly, demonstrating that using flowtigs improves the continuity of assembly results compared to traditional methods, both in simulations and real data contexts.
View Article and Find Full Text PDF

Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g.

View Article and Find Full Text PDF

With advances in long-read sequencing and assembly techniques, haplotype-resolved (phased) genome assemblies are becoming more common, also in the field of plant genomics. Computational tools to effectively explore these phased genomes, particularly for polyploid genomes, are currently limited. Here we describe a new strategy adopting a pangenome approach.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!