We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel Xeon Phi co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 and 1.62, respectively, as compared to efficient CPU and GPU implementations.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4903027PMC
http://dx.doi.org/10.1109/SBAC-PAD.2015.13DOI Listing

Publication Analysis

Top Keywords

xeon phi
16
iwpp algorithm
12
irregular wavefront
8
wavefront propagation
8
intel xeon
8
atomic instructions
8
original iwpp
8
iwpp
7
efficient irregular
4
propagation algorithms
4

Similar Publications

A central component in simulating cardiac electrophysiology is the numerical solution of nonlinear ordinary differential equations, also called cardiac ionic cell models, that describe cross-cell-membrane ion transport. Biophysically detailed cell models often require a considerable amount of computation, including calls to special mathematical functions. This paper systematically studies how to efficiently use modern multicore CPUs for this costly computational task.

View Article and Find Full Text PDF

Electronic Structure Theory Calculations Using Modern Architectures: KNL vs Haswell.

J Chem Theory Comput

November 2021

Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50011, United States.

The time to solution and parallel efficiency of several commonly used electronic structure methods (Hartree-Fock, density functional theory, second order perturbation theory, resolution of the identity second order perturbation theory, coupled cluster) are evaluated on both the Intel Xeon Haswell and the Intel Xeon Phi Knights Landing (KNL) architectures. The Haswell completes the benchmark calculations with a faster time to solution than the KNL for all molecules and methods tested. While the Haswell exhibits an average speedup of at least 3.

View Article and Find Full Text PDF

Ant Colony algorithm has been applied to various optimisation problems, however, most of the previous work on scaling and parallelism focuses on Travelling Salesman Problems (TSPs). Although useful for benchmarks and new idea comparison, the algorithmic dynamics do not always transfer to complex real-life problems, where additional -data is required during solution construction. This paper explores how the benchmark performance differs from real-world problems in the context of Ant Colony Optimization (ACO) and demonstrate that in order to generalise the findings, the algorithms have to be tested on both standard benchmarks and real-world applications.

View Article and Find Full Text PDF

Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials.

Algorithms Mol Biol

May 2020

Department of Computer Science and Engineering, Lehigh University, 113 Research Drive, Bethlehem, PA USA.

Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Because subtle features are central for explaining how proteins achieve specificity, algorithmic efficiency and geometric precision are central to algorithmic design. To address these concerns, this paper presents pClay, the first algorithm to perform parallel and arbitrarily precise comparisons of molecular surfaces and electrostatic isopotentials as geometric solids.

View Article and Find Full Text PDF

State-of-the-art distributed-memory computer clusters contain multicore CPUs with 16 and more cores. The second generation of the Intel Xeon Phi many-core processor has more than 60 cores with 16 GB of high-performance on-chip memory. We contrast the performance of the second-generation Intel Xeon Phi, code-named Knights Landing (KNL), with 68 computational cores to the latest multicore CPU Intel Skylake with 18 cores.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!