Phys Imaging Radiat Oncol
July 2023
Background And Purpose: Physiological motion impacts the dose delivered to tumours and vital organs in external beam radiotherapy and particularly in particle therapy. The excellent soft-tissue demarcation of 4D magnetic resonance imaging (4D-MRI) could inform on intra-fractional motion, but long image reconstruction times hinder its use in online treatment adaptation. Here we employ techniques from high-performance computing to reduce 4D-MRI reconstruction times below two minutes to facilitate their use in MR-guided radiotherapy.
View Article and Find Full Text PDFProcessing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
June 2024
Computing convolutional layers in the frequency domain using fast Fourier transformation (FFT) has been demonstrated to be effective in reducing the computational complexity of convolutional neural networks (CNNs). Nevertheless, the main challenge of this approach lies in the frequent and repeated transformations between the spatial and frequency domains due to the absence of nonlinear functions in the spectral domain, as such it makes the benefit less attractive for low-latency inference, especially on embedded platforms. To overcome the drawbacks in the existing FFT-based convolution, we propose a fully spectral CNN using a novel spectral-domain adaptive rectified linear unit (ReLU) layer, which completely removes the compute-intensive transformations between the spatial and frequency domains within the network.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2023
Over the past few years, 2-D convolutional neural networks (CNNs) have demonstrated their great success in a wide range of 2-D computer vision applications, such as image classification and object detection. At the same time, 3-D CNNs, as a variant of 2-D CNNs, have shown their excellent ability to analyze 3-D data, such as video and geometric data. However, the heavy algorithmic complexity of 2-D and 3-D CNNs imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2022
Due to the huge success and rapid development of convolutional neural networks (CNNs), there is a growing demand for hardware accelerators that accommodate a variety of CNNs to improve their inference latency and energy efficiency, in order to enable their deployment in real-time applications. Among popular platforms, field-programmable gate arrays (FPGAs) have been widely adopted for CNN acceleration because of their capability to provide superior energy efficiency and low-latency processing, while supporting high reconfigurability, making them favorable for accelerating rapidly evolving CNN algorithms. This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs.
View Article and Find Full Text PDFInt J Comput Assist Radiol Surg
March 2021
Purpose: Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm's robustness, but the innate computation required by the registration algorithm has been unresolved.
View Article and Find Full Text PDFBackground: Current popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants. Since reads deriving from variant loci that diverge in sequence substantially from the reference are often assigned incorrect mapping coordinates, variant calling pipelines that rely on mapping coordinates can exhibit reduced sensitivity.
Results: In this work we present GeDi, a suffix array-based somatic single nucleotide variant (SNV) calling algorithm that does not rely on read mapping coordinates to detect SNVs and is therefore capable of reference-free and mapping-free SNV detection.
Genetic programming can be used to identify complex patterns in financial markets which may lead to more advanced trading strategies. However, the computationally intensive nature of genetic programming makes it difficult to apply to real world problems, particularly in real-time constrained scenarios. In this work we propose the use of Field Programmable Gate Array technology to accelerate the fitness evaluation step, one of the most computationally demanding operations in genetic programming.
View Article and Find Full Text PDFWe demonstrate the use of dataflow technology in the computation of the correlation energy in molecules at the Møller-Plesset perturbation theory (MP2) level. Specifically, we benchmark density fitting (DF)-MP2 for as many as 168 atoms (in valinomycin) and show that speed-ups between 3 and 3.8 times can be achieved when compared to the MOLPRO package run on a single CPU.
View Article and Find Full Text PDFJ Adv Model Earth Syst
September 2015
Programmable hardware, in particular Field Programmable Gate Arrays (FPGAs), promises a significant increase in computational performance for simulations in geophysical fluid dynamics compared with CPUs of similar power consumption. FPGAs allow adjusting the representation of floating-point numbers to specific application needs. We analyze the performance-precision trade-off on FPGA hardware for the two-scale Lorenz '95 model.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
November 2017
One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored.
View Article and Find Full Text PDFNeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor.
View Article and Find Full Text PDFThis paper presents a real-time control framework for a snake robot with hyper-kinematic redundancy under dynamic active constraints for minimally invasive surgery. A proximity query (PQ) formulation is proposed to compute the deviation of the robot motion from predefined anatomical constraints. The proposed method is generic and can be applied to any snake robot represented as a set of control vertices.
View Article and Find Full Text PDF