De novo genome assembly of a high-protein soybean variety HJ117.

BMC Genom Data

Hebei Key Laboratory of Crop Genetics and Breeding, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, National Soybean Improvement Center Shijiazhuang Sub- Center, Hebei Academy of Agricultural and Forestry Sciences, 050035, Shijiazhuang, Hebei, China.

Published: March 2024

Objectives: Soybean is an important feed and oil crop in the world due to its high protein and oil content. China has a collection of more than 43,000 soybean germplasm resources, which provides a rich genetic diversity for soybean breeding. However, the rich genetic diversity poses great challenges to the genetic improvement of soybean. This study reports on the de novo genome assembly of HJ117, a soybean variety with high protein content of 52.99%. These data will prove to be valuable resources for further soybean quality improvement research, and will aid in the elucidation of regulatory mechanisms underlying soybean protein content.

Data Description: We generated a contiguous reference genome of 1041.94 Mb for HJ117 using a combination of Illumina short reads (23.38 Gb) and PacBio long reads (25.58 Gb), with high-quality sequence coverage of approximately 22.44× and 24.55×, respectively. HJ117 was developed through backcross breeding, using Jidou 12 as the recurrent parent and Chamoshidou as the donor parent. The assembly was further assisted by 114.5 Gb Hi-C data (109.9×), resulting in a contig N50 of 19.32 Mb and scaffold N50 of 51.43 Mb. Notably, Core Eukaryotic Genes Mapping Approach (CEGMA) assessment and Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment results indicated that most core eukaryotic genes (97.18%) and genes in the BUSCO dataset (99.4%) were identified, and 96.44% of the genomic sequences were anchored onto twenty pseudochromosomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10913422PMC
http://dx.doi.org/10.1186/s12863-024-01213-1DOI Listing

Publication Analysis

Top Keywords

novo genome
8
genome assembly
8
soybean
8
soybean variety
8
high protein
8
rich genetic
8
genetic diversity
8
core eukaryotic
8
eukaryotic genes
8
assembly high-protein
4

Similar Publications

Nearly all pancreatic adenocarcinomas (PDAC) are genomically characterized by KRAS exon 2 mutations. Most patients with PDAC present with advanced disease and are treated with cytotoxic therapy. Genomic biomarkers prognostic of disease outcomes have been challenging to identify.

View Article and Find Full Text PDF

Childhood maltreatment exposure (CME) increases the risk of adverse long-term health consequences for the exposed individual. Animal studies suggest that CME may also influence the health and behaviour in the next generation offspring through CME-driven epigenetic changes in the germ line. Here we investigated the associated between early life stress on the epigenome of sperm in humans with history of CME.

View Article and Find Full Text PDF

Understanding the molecular landscape of nonmuscle-invasive bladder cancer (NMIBC) is essential to improve risk assessment and treatment regimens. We performed a comprehensive genomic analysis of patients with NMIBC using whole-exome sequencing (n = 438), shallow whole-genome sequencing (n = 362) and total RNA sequencing (n = 414). A large genomic variation within NMIBC was observed and correlated with different molecular subtypes.

View Article and Find Full Text PDF

Genetic factors shaping the plasma lipidome and the relations to cardiometabolic risk in children and adolescents.

EBioMedicine

January 2025

Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark. Electronic address:

Background: Lipid species are emerging as biomarkers for cardiometabolic risk in both adults and children. The genetic regulation of lipid species and their impact on cardiometabolic risk during early life remain unexplored.

Methods: Using mass spectrometry-based lipidomics, we measured 227 plasma lipid species in 1149 children and adolescents (44.

View Article and Find Full Text PDF

Forests face an escalating threat from the increasing frequency of extreme drought events driven by climate change. To address this challenge, it is crucial to understand how widely distributed species of economic or ecological importance may respond to drought stress. In this study, we examined the transcriptome of white spruce (Picea glauca (Moench) Voss) to identify key genes and metabolic pathways involved in the species' response to water stress.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!