Distributed large-scale graph processing on FPGAs.

Amin Sahebi Marco Barbone Marco Procaccini Wayne Luk Georgi Gaydadjiev Roberto Giorgi

J Big Data

Department of Information Engineering and Mathematics, University of Siena, Siena, Italy.

Published: June 2023

Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators' resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host's file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, achieving a speedup of 13 compared to 8 and 3 respectively. Moreover, in the case of the large-scale graphs, the GPU solution fails due to memory limitations while the CPU solution achieves a speedup of 12 compared to the 26x achieved by our FPGA solution. Other state-of-the-art FPGA solutions are 28 times slower than our proposed solution. When the size of a graph limits the performance of a single FPGA device, our performance model shows that using multi-FPGAs in a distributed system can further improve the performance by about 12x. This highlights our implementation efficiency for large datasets not fitting in the on-chip memory of a hardware device.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10239738	PMC
http://dx.doi.org/10.1186/s40537-023-00756-x	DOI Listing

Publication Analysis

Top Keywords

large-scale graphs

on-chip memory

fpga

graph

graph processing

layer computation

speedup compared

memory

performance

data

Similar Publications

Integration of unpaired single cell omics data by deep transfer graph convolutional network.

PLoS Comput Biol

January 2025

School of Mathematics/Harbin Institute of Technology, Harbin, China.

Yulong Kan Yunjing Qi Zhongxiao Zhang Xikeng Liang Weihao Wang

The rapid advance of large-scale atlas-level single cell RNA sequences and single-cell chromatin accessibility data provide extraordinary avenues to broad and deep insight into complex biological mechanism. Leveraging the datasets and transfering labels from scRNA-seq to scATAC-seq will empower the exploration of single-cell omics data. However, the current label transfer methods have limited performance, largely due to the lower capable of preserving fine-grained cell populations and intrinsic or extrinsic heterogeneity between datasets.

View Article and Find Full Text PDF

Similar Publications

ABCoRT: Retention Time Prediction for Metabolite Identification via Atom-Bond Co-Learning.

J Chem Inf Model

January 2025

School of Information Science and Engineering, Yunnan University, Kunming650091,China.

Guangbin Cheng Bingyi Wang Nannan Bai Weihua Li

Liquid chromatography retention time (RT) prediction plays a crucial role in metabolite identification, a challenging and essential task in untargeted metabolomics. Accurate molecular representation is vital for reliable RT prediction. To address this, we propose a novel molecular representation learning framework, ABCoRT(tom-ond -learning for etention ime prediction), designed for predicting metabolite retention times.

View Article and Find Full Text PDF

Similar Publications

Bibliometric analysis of laryngeal cancer treatment literature (2003-2023).

Heliyon

January 2025

Department of Otolaryngology Head and Neck Surgery, the Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, 518035, China.

Yan Zhao Jiancheng Xue

Background: Despite advancements in medical science, the 5-year survival rate for laryngeal squamous cell carcinoma remains low, posing significant challenges in clinical management. This study explores the evolution of key topics and trends in laryngeal cancer research. Bibliometric and knowledge graph analysis are utilized to assess contributions in treating this carcinoma and to forecast emerging research hotspots that may enhance future clinical outcomes.

View Article and Find Full Text PDF

Similar Publications

Establishing a GRU-GCN coordination-based prediction model for miRNA-disease associations.

BMC Genom Data

January 2025

Department of Management Information Systems, National Chung Hsing University, Taichung, 402, Taiwan.

Kai-Cheng Chuang Ping-Sung Cheng Yu-Hung Tsai Meng-Hsiun Tsai

Background: miRNAs (microRNAs) are endogenous RNAs with lengths of 18 to 24 nucleotides and play critical roles in gene regulation and disease progression. Although traditional wet-lab experiments provide direct evidence for miRNA-disease associations, they are often time-consuming and complicated to analyze by current bioinformatics tools. In recent years, machine learning (ML) and deep learning (DL) techniques are powerful tools to analyze large-scale biological data.

View Article and Find Full Text PDF

Similar Publications

Homo Sapiens Chromosomal Location Ontology: A Framework for Genomic Data in Biomedical Knowledge Graphs.

Sci Data

January 2025

The Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Taha Mohseni Ahooyi Benjamin Stear J Alan Simmons Christopher M Nemarich Jonathan C Silverstein

The Homo sapiens Chromosomal Location Ontology (HSCLO) is designed to facilitate the integration of human genomic features into biomedical knowledge graphs from releases GRCh37 and GRCh38 at multiple resolutions. HSCLO comprises two distinct versions, HSCLO37 and HSCLO38, each tailored to its respective human genome release. This ontology supports the efficient integration and analysis of human genomic data across scales ranging from entire chromosomes to individual base pairs, thereby enhancing data retrieval and interoperability within large-scale biomedical datasets.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!