As the size of networks increases, it is becoming important to analyze large-scale network data. A network clustering algorithm is useful for analysis of network data. Conventional network clustering algorithms in a single machine environment rather than a parallel machine environment are actively being researched. However, these algorithms cannot analyze large-scale network data because of memory size issues. As a solution, we propose a network clustering algorithm for large-scale network data analysis using Apache Spark by changing the paradigm of the conventional clustering algorithm to improve its efficiency in the Apache Spark environment. We also apply optimization approaches such as Bloom filter and shuffle selection to reduce memory usage and execution time. By evaluating our proposed algorithm based on an average normalized cut, we confirmed that the algorithm can analyze diverse large-scale network datasets such as biological, co-authorship, internet topology and social networks. Experimental results show that the proposed algorithm can develop more accurate clusters than comparative algorithms with less memory usage. Furthermore, we confirm the proposed optimization approaches and the scalability of the proposed algorithm. In addition, we validate that clusters found from the proposed algorithm can represent biologically meaningful functions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6179193PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0203670PLOS

Publication Analysis

Top Keywords

large-scale network
20
network clustering
16
clustering algorithm
16
network data
16
proposed algorithm
16
network
10
algorithm
9
algorithm based
8
analyze large-scale
8
machine environment
8

Similar Publications

Neuronal subtypes derived from the embryonic hypothalamus and prethalamus regulate many essential physiological processes, yet the gene regulatory networks controlling their development remain poorly understood. Using single-cell RNA- and ATAC-sequencing, we analyzed mouse hypothalamic and prethalamic development from embryonic day 11 to postnatal day 8, profiling 660,000 cells in total. This identified key transcriptional and chromatin dynamics driving regionalization, neurogenesis, and differentiation.

View Article and Find Full Text PDF

Alzheimer's disease (AD) is a form of dementia in which memory and cognitive decline is thought to arise from underlying neurodegeneration. These cognitive impairments, however, are transient when they first appear and can fluctuate across disease progression. Here, we investigate the neural mechanisms underlying fluctuations of performance in amnestic mice.

View Article and Find Full Text PDF

Supracondylar humerus fractures in children are among the most common elbow fractures in pediatrics. However, their diagnosis can be particularly challenging due to the anatomical characteristics and imaging features of the pediatric skeleton. In recent years, convolutional neural networks (CNNs) have achieved notable success in medical image analysis, though their performance typically relies on large-scale, high-quality labeled datasets.

View Article and Find Full Text PDF

In the field of emerging materials, metal-organic frameworks (MOFs) have gained prominence due to their unique porous structures, showing versatility in gas adsorption, storage, separation, and liquid processes. However, their decomposition, collapse tendencies, and complex synthesis make large-scale production costly and challenging with no accurate method for predicting synthesis conditions. This work proposes an intelligent prediction model based on the structural characteristics of MOFs to forecast synthesis conditions.

View Article and Find Full Text PDF

Rationale And Objectives: Cognitive disorders, such as Alzheimer's disease (AD) and Parkinson's disease (PD), significantly impact the quality of life in older adults. Mild cognitive impairment (MCI) is a critical stage for intervention and can predict the development of dementia. The causes of these two diseases are not fully understood, but there is an overlap in their neuropathology.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!