The high-quality genome of coconut (Cocos nucifera L.) is a crucial resource for enhancing agronomic traits and studying genome evolution within the Arecaceae family. We sequenced the Chowghat Green Dwarf cultivar, which is resistant to the root (wilt) disease, utilizing Illumina, PacBio, ONT, and Hi-C technologies to produce a chromosome-level genome of ~ 2.68 Gb with a scaffold N50 of 174 Mb; approximately 97% of the genome could be anchored to 16 pseudo-molecules (2.62 Gb). In total, 34,483 protein-coding genes were annotated; the BUSCO completeness score was 96.80%, while the k-mer completeness was ~ 87%. The assembled genome includes 2.19 Gb (81.64%) of repetitive sequences, with long terminal repeats (LTRs) constituting the most abundant class at 53.76%. Additionally, our analysis confirms two whole-genome duplication (WGD) events in the C. nucifera lineage. A genome-wide analysis of LTR insertion time revealed ancient divergence and proliferation of copia and gypsy elements. In addition, 1368 RGAs were discovered in the CGD genome. We also developed a web server 'Kalpa Genome Resource' ( http://210.89.54.198:3000/ ), to manage and store a comprehensive array of genomic data, including genome sequences, genetic markers, structural and functional annotations like metabolic pathways, and transcriptomic profiles. The web server has an embedded genome browser to analyze and visualize the genome, its genomics elements, and transcriptome data. The in-built BLAST server allows sequence homology searches against genome, annotated transcriptome & proteome sequences. The genomic dataset and the database will support comparative genome analysis and can expedite genome-driven breeding and enhancement efforts for tapping genetic gains in coconut.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579352PMC
http://dx.doi.org/10.1038/s41598-024-79768-3DOI Listing

Publication Analysis

Top Keywords

genome
13
chowghat green
8
green dwarf
8
web server
8
chromosome scale
4
scale genome
4
genome assembly
4
assembly annotation
4
annotation coconut
4
coconut cultivar
4

Similar Publications

Motivation: We are witnessing an enormous growth in the amount of molecular profiling (-omics) data. The integration of multi-omics data is challenging. Moreover, human multi-omics data may be privacy-sensitive and can be misused to de-anonymize and (re-)identify individuals.

View Article and Find Full Text PDF

As molecular research on hemp (Cannabis sativa L.) continues to advance, there is a growing need for the accumulation of more diverse genome data and more accurate genome assemblies. In this study, we report the three-way assembly data of a cannabidiol (CBD)-rich cannabis variety, 'Pink Pepper' cultivar using sequencing technology: PacBio Single Molecule Real-Time (SMRT) technology, Illumina sequencing technology, and Oxford Nanopore Technology (ONT).

View Article and Find Full Text PDF

Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review.

BMC Public Health

December 2024

Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.

Background: Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases.

View Article and Find Full Text PDF

Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge.

View Article and Find Full Text PDF

Increased immune evasion by emerging and highly mutated SARS-CoV-2 variants is a key challenge to the control of COVID-19. The majority of these mutations mainly target the spike protein, allowing the new variants to escape the immunity previously raised by vaccination and/or infection by earlier variants of SARS-CoV-2. In this study, we investigated the neutralizing capacity of antibodies against emerging variants of interest circulating between May 2023 and October 2024 using sera from representative samples of the Kenyan population.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!