Backgrounds: Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods: Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result: A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3976248PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0091315PLOS

Publication Analysis

Top Keywords

data sets
20
affinity propagation
20
propagation algorithm
16
similarity matrix
12
data
9
biological data
8
clustering algorithms
8
runtime required
8
handling large-scale
8
large-scale data
8

Similar Publications

De novo transcriptome assembly of the Perna viridis: A novel invertebrate model for ecotoxicological studies.

Sci Data

January 2025

Marine Biotechnology Fish Nutrition and Health Division, Central Marine Fisheries Research Institute, Post Box No 1603 Ernakulam North PO., Kochi, 682018, Kerala, India.

Mussels, particularly Perna viridis, are vital sentinel species for toxicology and biomonitoring in environmental health. This species plays a crucial role in aquaculture and significantly impacts the fisheries sector. Despite the ecological and economic importance of this species, its omics resources are still scarce.

View Article and Find Full Text PDF

Black carp (Mylopharyngodon piceus) is one of the "four famous domestic fishes" in China and an important economic fish in freshwater aquaculture. A high-quality genome is essential for advancing future biological research and breeding programs for this species. In this study, we aimed to generate a high-quality chromosome-level genome assembly of black carp using Nanopore and Hi-C technologies.

View Article and Find Full Text PDF

The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology.

View Article and Find Full Text PDF

Ultrasound is a primary diagnostic tool commonly used to evaluate internal body structures, including organs, blood vessels, the musculoskeletal system, and fetal development. Due to challenges such as operator dependence, noise, limited field of view, difficulty in imaging through bone and air, and variability across different systems, diagnosing abnormalities in ultrasound images is particularly challenging for less experienced clinicians. The development of artificial intelligence (AI) technology could assist in the diagnosis of ultrasound images.

View Article and Find Full Text PDF

As the occurrence of human diseases and conditions increase, questions continue to arise about their linkages to chemical exposure, especially for per-and polyfluoroalkyl substances (PFAS). Currently, many chemicals of concern have limited experimental information available for their use in analytical assessments. Here, we aim to increase this knowledge by providing the scientific community with multidimensional characteristics for 175 PFAS and their resulting 281 ion types.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!