DeepHapNet: a haplotype assembly method based on RetNet and deep spectral clustering.

Brief Bioinform

School of Computer and Information Engineering, Henan University, North Section of Jinming Avenue, Kaifeng 475001, China.

Published: November 2024

Gene polymorphism originates from single-nucleotide polymorphisms (SNPs), and the analysis and study of SNPs are of great significance in the field of biogenetics. The haplotype, which consists of the sequence of SNP loci, carries more genetic information than a single SNP. Haplotype assembly plays a significant role in understanding gene function, diagnosing complex diseases, and pinpointing species genes. We propose a novel method, DeepHapNet, for haplotype assembly through the clustering of reads and learning correlations between read pairs. We employ a sequence model called Retentive Network (RetNet), which utilizes a multiscale retention mechanism to extract read features and learn the global relationships among them. Based on the feature representation of reads learned from the RetNet model, the clustering process of reads is implemented using the SpectralNet model, and, finally, haplotypes are constructed based on the read clusters. Experiments with simulated and real datasets show that the method performs well in the haplotype assembly problem of diploid and polyploid based on either long or short reads. The code implementation of DeepHapNet and the processing scripts for experimental data are publicly available at https://github.com/wjj6666/DeepHapNet.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbae656DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652615PMC

Publication Analysis

Top Keywords

haplotype assembly
16
deephapnet haplotype
8
assembly
4
assembly method
4
based
4
method based
4
based retnet
4
retnet deep
4
deep spectral
4
spectral clustering
4

Similar Publications

The highly allo-autopolyploid modern sugarcane genome and very recent allopolyploidization in Saccharum.

Nat Genet

January 2025

Center for Genomics, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, China.

Modern sugarcane, a highly allo-autopolyploid organism, has a very complex genome. In the present study, the karyotype and genome architecture of modern sugarcane were investigated, resulting in a genome assembly of 97 chromosomes (8.84 Gb).

View Article and Find Full Text PDF

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Amsterdam UMC, Amsterdam, Netherlands.

Background: The TMEM106B protein is critical for proper functioning of the endolysomal system, which is utilised by all cells to traffic and degrade molecular cargo. Genome-wide association studies identified a haplotype in the TMEM106B gene that is associated with increased risk for Alzheimer's disease (AD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration with TAR DNA binding protein inclusions (FTLD-TDP). However, the causal variant that drives the association has thus far remained elusive.

View Article and Find Full Text PDF

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Genomics of Neurodegenerative Diseases and Aging, Human Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Netherlands.

Background: Genome-Wide Association Studies (GWAS) have identified 86 SNPs associated with Alzheimer's disease (AD). GWAS-SNPs are markers of genetic variation in linkage disequilibrium (LD), which may drive the association with AD. One major class of genetic variation are Structural Variants (SVs), which can regulate transcription and translation of nearby genes.

View Article and Find Full Text PDF

Background: Structural variants (SVs), genomic alterations exceeding 50 base-pairs, are known for their significant impact on disease pathology. However, the role of SVs in Alzheimer's Disease (AD) remains unclear. Using a novel high-accuracy SV calling pipeline, we analyzed a diverse sample from the Alzheimer's Disease Sequencing Project (ADSP).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!