E-MEM: efficient computation of maximal exact matches for very large genomes.

Bioinformatics

Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada.

Published: February 2015

Motivation: Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results. Still, their memory requirements are high, the parallelization is not very efficient, and they cannot handle very large genomes.

Results: We present a new algorithm, efficient computation of MEMs (E-MEM) that does not use full text indexes. Our algorithm uses much less space and is highly amenable to parallelization. It can compute all MEMs of minimum length 100 between the whole human and mouse genomes on a 12 core machine in 10 min and 2 GB of memory; the required memory can be as low as 600 MB. It can run efficiently genomes of any size. Extensive testing and comparison with currently best algorithms is provided.

Availability And Implementation: The source code of E-MEM is freely available at: http://www.csd.uwo.ca/∼ilie/E-MEM/ CONTACT: ilie@csd.uwo.ca

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu687DOI Listing

Publication Analysis

Top Keywords

efficient computation
8
maximal exact
8
exact matches
8
large genomes
8
computation mems
8
full text
8
text indexes
8
genomes
5
e-mem efficient
4
computation maximal
4

Similar Publications

Evaluation of an enhanced ResNet-18 classification model for rapid On-site diagnosis in respiratory cytology.

BMC Cancer

January 2025

Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.

Objective: Rapid on-site evaluation (ROSE) of respiratory cytology specimens is a critical technique for accurate and timely diagnosis of lung cancer. However, in China, limited familiarity with the Diff-Quik staining method and a shortage of trained cytopathologists hamper utilization of ROSE. Therefore, developing an improved deep learning model to assist clinicians in promptly and accurately evaluating Diff-Quik stained cytology samples during ROSE has important clinical value.

View Article and Find Full Text PDF

Hepatic cystic echinococcosis (HCE), a life-threatening liver disease, has 5 subtypes, i.e., single-cystic, polycystic, internal capsule collapse, solid mass, and calcified subtypes.

View Article and Find Full Text PDF

Flow prediction in sound-based uroflowmetry.

Sci Rep

January 2025

Department of Signal Theory and Communications, Universidad de Valladolid, 47002, Valladolid, Spain.

Sound-based uroflowmetry (SU) offers a non-invasive alternative to traditional uroflowmetry (UF) for evaluating lower urinary tract dysfunctions, enabling home-based testing and reducing the need for clinic visits. This study compares SU and UF in estimating urine flow rate and voided volume in 50 male volunteers (aged 18-60), with UF results from a Minze uroflowmeter as the reference standard. Audio signals recorded during voiding were segmented and machine learning algorithms (gradient boosting, random forest, and support vector machine) estimated flow parameters from three devices: Ultramic384k, Mi A1 smartphone, and Oppo smartwatch.

View Article and Find Full Text PDF

A Faster Privacy-Preserving Medical Image Diagnosis Scheme with Machine Learning.

J Imaging Inform Med

January 2025

College of Computer, Chongqing University, No. 55 Daxuecheng South Rd, Shapingba, 401331, Chongqing, China.

Convolutional neural networks (CNNs) have become indispensable to medical image diagnosis research, enabling the automated differentiation of diseased images from extensive medical image datasets. Due to their efficacy, these methods raise significant privacy concerns regarding patient images and diagnostic models. To address these issues, some researchers have explored privacy-preserving medical image diagnosis schemes using fully homomorphic encryption (FHE).

View Article and Find Full Text PDF

Electronic circular dichroism (ECD) spectra contain key information about molecular chirality by discriminating the absolute configurations of chiral molecules, which is crucial in asymmetric organic synthesis and the drug industry. However, existing predictive approaches lack the consideration of ECD spectra owing to the data scarcity and the limited interpretability to achieve trustworthy prediction. Here we establish a large-scale dataset for chiral molecular ECD spectra and propose ECDFormer for accurate and interpretable ECD spectrum prediction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!