Gene polymorphism originates from single-nucleotide polymorphisms (SNPs), and the analysis and study of SNPs are of great significance in the field of biogenetics. The haplotype, which consists of the sequence of SNP loci, carries more genetic information than a single SNP. Haplotype assembly plays a significant role in understanding gene function, diagnosing complex diseases, and pinpointing species genes. We propose a novel method, DeepHapNet, for haplotype assembly through the clustering of reads and learning correlations between read pairs. We employ a sequence model called Retentive Network (RetNet), which utilizes a multiscale retention mechanism to extract read features and learn the global relationships among them. Based on the feature representation of reads learned from the RetNet model, the clustering process of reads is implemented using the SpectralNet model, and, finally, haplotypes are constructed based on the read clusters. Experiments with simulated and real datasets show that the method performs well in the haplotype assembly problem of diploid and polyploid based on either long or short reads. The code implementation of DeepHapNet and the processing scripts for experimental data are publicly available at https://github.com/wjj6666/DeepHapNet.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bib/bbae656 | DOI Listing |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652615 | PMC |
Nat Genet
January 2025
Center for Genomics, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, China.
Modern sugarcane, a highly allo-autopolyploid organism, has a very complex genome. In the present study, the karyotype and genome architecture of modern sugarcane were investigated, resulting in a genome assembly of 97 chromosomes (8.84 Gb).
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Amsterdam UMC, Amsterdam, Netherlands.
Background: The TMEM106B protein is critical for proper functioning of the endolysomal system, which is utilised by all cells to traffic and degrade molecular cargo. Genome-wide association studies identified a haplotype in the TMEM106B gene that is associated with increased risk for Alzheimer's disease (AD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration with TAR DNA binding protein inclusions (FTLD-TDP). However, the causal variant that drives the association has thus far remained elusive.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Genomics of Neurodegenerative Diseases and Aging, Human Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Netherlands.
Background: Genome-Wide Association Studies (GWAS) have identified 86 SNPs associated with Alzheimer's disease (AD). GWAS-SNPs are markers of genetic variation in linkage disequilibrium (LD), which may drive the association with AD. One major class of genetic variation are Structural Variants (SVs), which can regulate transcription and translation of nearby genes.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
University of Texas Health Science Center at Houston, Houston, TX, USA.
Background: Structural variants (SVs), genomic alterations exceeding 50 base-pairs, are known for their significant impact on disease pathology. However, the role of SVs in Alzheimer's Disease (AD) remains unclear. Using a novel high-accuracy SV calling pipeline, we analyzed a diverse sample from the Alzheimer's Disease Sequencing Project (ADSP).
View Article and Find Full Text PDFSci China Life Sci
December 2024
Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!