High-performance computing (HPC) systems play a critical role in facilitating scientific discoveries. Their scale and complexity (e.g.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
October 2021
The trend of rapid technology scaling is expected to make the hardware of high-performance computing (HPC) systems more susceptible to computational errors due to random bit flips. Some bit flips may cause a program to crash or have a minimal effect on the output, but others may lead to silent data corruption (SDC), i.e.
View Article and Find Full Text PDF