Motivation: Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results. Still, their memory requirements are high, the parallelization is not very efficient, and they cannot handle very large genomes.
Results: We present a new algorithm, efficient computation of MEMs (E-MEM) that does not use full text indexes. Our algorithm uses much less space and is highly amenable to parallelization. It can compute all MEMs of minimum length 100 between the whole human and mouse genomes on a 12 core machine in 10 min and 2 GB of memory; the required memory can be as low as 600 MB. It can run efficiently genomes of any size. Extensive testing and comparison with currently best algorithms is provided.
Availability And Implementation: The source code of E-MEM is freely available at: http://www.csd.uwo.ca/∼ilie/E-MEM/ CONTACT: ilie@csd.uwo.ca
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btu687 | DOI Listing |
BMC Cancer
January 2025
Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
Objective: Rapid on-site evaluation (ROSE) of respiratory cytology specimens is a critical technique for accurate and timely diagnosis of lung cancer. However, in China, limited familiarity with the Diff-Quik staining method and a shortage of trained cytopathologists hamper utilization of ROSE. Therefore, developing an improved deep learning model to assist clinicians in promptly and accurately evaluating Diff-Quik stained cytology samples during ROSE has important clinical value.
View Article and Find Full Text PDFSci Rep
January 2025
College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830017, China.
Hepatic cystic echinococcosis (HCE), a life-threatening liver disease, has 5 subtypes, i.e., single-cystic, polycystic, internal capsule collapse, solid mass, and calcified subtypes.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Signal Theory and Communications, Universidad de Valladolid, 47002, Valladolid, Spain.
Sound-based uroflowmetry (SU) offers a non-invasive alternative to traditional uroflowmetry (UF) for evaluating lower urinary tract dysfunctions, enabling home-based testing and reducing the need for clinic visits. This study compares SU and UF in estimating urine flow rate and voided volume in 50 male volunteers (aged 18-60), with UF results from a Minze uroflowmeter as the reference standard. Audio signals recorded during voiding were segmented and machine learning algorithms (gradient boosting, random forest, and support vector machine) estimated flow parameters from three devices: Ultramic384k, Mi A1 smartphone, and Oppo smartwatch.
View Article and Find Full Text PDFJ Imaging Inform Med
January 2025
College of Computer, Chongqing University, No. 55 Daxuecheng South Rd, Shapingba, 401331, Chongqing, China.
Convolutional neural networks (CNNs) have become indispensable to medical image diagnosis research, enabling the automated differentiation of diseased images from extensive medical image datasets. Due to their efficacy, these methods raise significant privacy concerns regarding patient images and diagnostic models. To address these issues, some researchers have explored privacy-preserving medical image diagnosis schemes using fully homomorphic encryption (FHE).
View Article and Find Full Text PDFNat Comput Sci
January 2025
AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, China.
Electronic circular dichroism (ECD) spectra contain key information about molecular chirality by discriminating the absolute configurations of chiral molecules, which is crucial in asymmetric organic synthesis and the drug industry. However, existing predictive approaches lack the consideration of ECD spectra owing to the data scarcity and the limited interpretability to achieve trustworthy prediction. Here we establish a large-scale dataset for chiral molecular ECD spectra and propose ECDFormer for accurate and interpretable ECD spectrum prediction.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!