We explore the multisend interface as a data mover interface to optimize applications with neighborhood collective communication operations. One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and nonzero bytes) to be specified for all the processors in the communicator. Further, all the collective calls in MPI 2.1 are blocking and do not permit overlap of communication with computation. We present the record replay persistent optimization to the multisend interface that minimizes the processor overhead of initiating the collective. We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator NEURON. Performance results show 1.9× speedup with 32(3) 3D-FFTs, 1.9× speedup for 4D nearest neighbor exchange with the 2(4) problem, 1.6× speedup in NAMD and almost 3× speedup in NEURON with 256K cells and 1k connections/cell.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111918PMC
http://dx.doi.org/10.1109/IPDPS.2010.5470407DOI Listing

Publication Analysis

Top Keywords

blue gene/p
8
multisend interface
8
collective calls
8
nearest neighbor
8
neighbor exchange
8
19× speedup
8
optimization applications
4
applications non-blocking
4
non-blocking neighborhood
4
neighborhood collectives
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!