The Irregular Wavefront Propagation Pattern (IWPP) is a core computing structure in several image analysis operations. Efficient implementation of IWPP on the Intel Xeon Phi is difficult because of the irregular data access and computation characteristics. The traditional IWPP algorithm relies on atomic instructions, which are not available in the SIMD set of the Intel Phi. To overcome this limitation, we have proposed a new IWPP algorithm that can take advantage of non-atomic SIMD instructions supported on the Intel Xeon Phi. We have also developed and evaluated methods to use CPU and Intel Phi cooperatively for parallel execution of the IWPP algorithms. Our new cooperative IWPP version is also able to handle large out-of-core images that would not fit into the memory of the accelerator. The new IWPP algorithm is used to implement the Morphological Reconstruction and Fill Holes operations, which are operations commonly found in image analysis applications. The vectorization implemented with the new IWPP has attained improvements of up to about 5× on top of the original IWPP and significant gains as compared to state-of-the-art the CPU and GPU versions. The new version running on an Intel Phi is 6.21× and 3.14× faster than running on a 16-core CPU and on a GPU, respectively. Finally, the cooperative execution using two Intel Phi devices and a multi-core CPU has reached performance gains of 2.14× as compared to the execution using a single Intel Xeon Phi.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6195363 | PMC |
http://dx.doi.org/10.1002/cpe.4425 | DOI Listing |
PeerJ
December 2024
OpenGeoHub Foundation, Doorwerth, Netherlands.
Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility.
View Article and Find Full Text PDFChem Sci
December 2024
Laboratoire de Chimie Théorique (LCT), Sorbonne Université, CNRS 4 Pl. Jussieu Paris 75005 France
Superconductivity can be considered among the most exciting discoveries in material science of the 20th century. However, the hard conditions for the synthesis and the difficult characterization, make the statement of new high critical temperature ( ) complex from the experimental viewpoint and have recently led to several hot controversies in the literature. In this panorama, theory has become a trustworthy diagnosis.
View Article and Find Full Text PDFCryst Growth Des
April 2024
Department of Biomedical Engineering, University of Iowa, 103 South Capitol Street, 5601 Seamans Center for the Engineering Arts and Sciences, Iowa City, Iowa 52242, United States.
The formulation of active pharmaceutical ingredients involves discovering stable crystal packing arrangements or polymorphs, each of which has distinct pharmaceutically relevant properties. Traditional experimental screening techniques utilizing various conditions are commonly supplemented with in silico crystal structure prediction (CSP) to inform the crystallization process and mitigate risk. Predictions are often based on advanced classical force fields or quantum mechanical calculations that model the crystal potential energy landscape but do not fully incorporate temperature, pressure, or solution conditions during the search procedure.
View Article and Find Full Text PDFSensors (Basel)
March 2024
College of Information Technology, United Arab Emirates University, Abu Dhabi 15551, United Arab Emirates.
The cooperative, connected, and automated mobility (CCAM) infrastructure plays a key role in understanding and enhancing the environmental perception of autonomous vehicles (AVs) driving in complex urban settings. However, the deployment of CCAM infrastructure necessitates the efficient selection of the computational processing layer and deployment of machine learning (ML) and deep learning (DL) models to achieve greater performance of AVs in complex urban environments. In this paper, we propose a computational framework and analyze the effectiveness of a custom-trained DL model (YOLOv8) when deployed in diverse devices and settings at the vehicle-edge-cloud-layered architecture.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!