cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining.

BMC Bioinformatics

College of Computer Science and Electronic Engineering & National Supercomputer Centre in Changsha, Hunan University, Changsha, 410082, China.

Published: May 2018

Background: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large. Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory.

Results: In this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the other hand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization.

Conclusions: Evaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem with huge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloud/cmFSM .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998871PMC
http://dx.doi.org/10.1186/s12859-018-2071-zDOI Listing

Publication Analysis

Top Keywords

frequent subgraph
12
subgraph mining
12
single node
12
mining
9
mining problem
8
parallel acceleration
8
parallel
6
cmfsm
5
cmfsm scalable
4
scalable cpu-mic
4

Similar Publications

Draw+: network-based computational drug repositioning with attention walking and noise filtering.

Health Inf Sci Syst

December 2025

Division of Software, Yonsei University, Mirae Campus, Yeonsedae-gil 1, Wonju-si, 26493 Gangwon-do Korea.

Purpose: Drug repositioning, a strategy that repurposes already-approved drugs for novel therapeutic applications, provides a faster and more cost-effective alternative to traditional drug discovery. Network-based models have been adopted by many computational methodologies, especially those that use graph neural networks to predict drug-disease associations. However, these techniques frequently overlook the quality of the input network, which is a critical factor for achieving accurate predictions.

View Article and Find Full Text PDF

Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications of the modern data science. Some of the FSM algorithms have the objective of finding all frequent subgraphs whereas some of the algorithms focus on discovering frequent subgraphs approximately. On the other hand, modern applications employ evolving graphs where the increments are small graphs or stream of nodes and edges.

View Article and Find Full Text PDF

Motivation: Spatial Analysis of Functional Enrichment (SAFE) is a popular tool for biologists to investigate the functional organization of biological networks via highly intuitive 2D functional maps. To create these maps, SAFE uses Spring embedding to project a given network into a 2D space in which nodes connected in the network are near each other in space. However, many biological networks are scale-free, containing highly connected hub nodes.

View Article and Find Full Text PDF

Introduction: Previously, we identified eight effective consultation skills to support decision-making in the voluntary surrender of older adult drivers' licences in super-aged Japan. This study aimed to clarify the transferability of these skills.

Methods: We collected text data by interviewing 11 safe-driving counsellors (four police officers, four clerical staff and three nurses) in the License Division of the National Police Agency from February to March 2022.

View Article and Find Full Text PDF

Rapid Mining of Fast Ion Conductors via Subgraph Isomorphism Matching.

J Am Chem Soc

July 2024

School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen 518055, P. R. China.

The rapidly evolving field of inorganic solid-state electrolytes (ISSEs) has been driven in recent years by advances in data-mining techniques, which facilitates the high-throughput computational screening for candidate materials in the databases. The key to the mining process is the selection of critical features that underline the similarity of a material to an existing ISSE. Unfortunately, this selection is generally subjective and frequently under debate.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!