Identifying shader sub-patterns for GPU performance tuning and architecture design.

Sci Rep

Advanced Micro Devices (China) Co., Ltd, Shanghai, 201210, China.

Published: October 2024

AI Article Synopsis

  • GPUs are essential in technology, and boosting their performance requires understanding and optimizing both their architecture and software code, which can be challenging due to confidentiality of source code.
  • ShaderAnalyzer is introduced as a solution that uses graph mining and machine learning to analyze low-level machine codes from GPUs, helping identify areas for performance improvement.
  • The framework assists engineers by detecting frequently occurring patterns in code, guiding them to fine-tune segments with the most potential for performance gains, and providing insights for future hardware design.

Article Abstract

GPUs are increasingly playing vital roles in the modern technology industry. Improving the GPU performance involves optimizing its architectural design and fine-tuning its software code. However, to achieve this, engineers must investigate codes from as many GPU-related applications as possible to identify code portions that need fine-tuning. Moreover, this effort requires engineers to have good domain knowledge, and their work is made more arduous because the source codes of applications are normally confidential. To this end, we introduce ShaderAnalyzer, a solution leveraging graph mining and machine learning to analyze GPU-executed low-level machine codes and identify their fine-tuning opportunities. Our approach includes representing machine code with graph structure and subsequently identifying frequently occurring substructures within the codes. Optimizing the execution of these substructures can enhance the overall performance of the GPU. In addition, our model leverages these frequent patterns to further facilitate engineers' tasks by selecting representative patterns to predict and investigate low-efficiency ones. We conduct comprehensive experiments to evaluate the performance of our solution, and the results have been validated by our industry partners. ShaderAnalyzer is an end-to-end framework that helps engineers identify code segments with the highest potential for performance gains after fine-tuning and offers valuable insights for hardware architects in future products design.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11473530PMC
http://dx.doi.org/10.1038/s41598-024-68974-8DOI Listing

Publication Analysis

Top Keywords

gpu performance
8
identify code
8
performance
5
identifying shader
4
shader sub-patterns
4
sub-patterns gpu
4
performance tuning
4
tuning architecture
4
architecture design
4
design gpus
4

Similar Publications

Large-scale acceleration algorithms for a deep convective physical parameterization scheme on GPU.

PLoS One

January 2025

China Energy Dadu River Hydropower Development Co., Ltd., Chengdu, China.

Early warning of geological hazards requires monitoring extreme weather conditions, such as heavy rainfall. Atmospheric circulation models are used for weather forecasting and climate simulation. As a critical physical process in atmospheric circulation models, the Zhang-McFarlane (ZM) deep convective physical parameterization scheme involves computationally intensive calculations that significantly impact the overall operational efficiency of the model.

View Article and Find Full Text PDF

A new projector, Orthogonal-Distance Ray-tracer Varying-Full Width at Half Maximum (OD-RT-VF), was developed to model a shift-variant elliptical point-spread function (PSF) response to improve the image quality of a preclinical dual-rotation PET system. Approach: The OD-RT-VF projector models different FWHM values of the PSF in multiple directions, using half-height and half-width tube-of-response (ToR) values. The OD-RT-VF method's performance was evaluated against the original OD-RT method and a ToR model with constant response.

View Article and Find Full Text PDF

We demonstrate high-resolution single-pixel imaging (SPI) in the visible and near-infrared wavelength ranges using an SPI framework that incorporates a novel, dedicated sampling scheme and a reconstruction algorithm optimized for the rapid imaging of highly sparse scenes at the native digital micromirror device (DMD) resolution of 1024 × 768. The reconstruction algorithm consists of two stages. In the first stage, the vector of SPI measurements is multiplied by the generalized inverse of the measurement matrix.

View Article and Find Full Text PDF

The scope of this work was to develop a thin-film composite (TFC) membrane for the separation of CO/CO mixtures, which are relevant for many processes of gas processing and gasification of carbon-based feedstock. Special attention was given to the development of highly permeable porous polysulfone (PSF) supports (more than 26,000 GPU for CO) since both the selective and support layers contribute significantly to the overall performance of the TFC membrane. The PSF porous support is widely used in commercial and lab-scale TFC membranes, and its porous structure and other exploitation parameters are set during the non-solvent-induced phase separation (NIPS) process.

View Article and Find Full Text PDF

Recent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown significant potential in medical applications, but their effectiveness is limited by a lack of specialized medical knowledge due to general-domain training. In this study, we developed Me-LLaMA, a new family of open-source medical LLMs that uniquely integrate extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA comprises foundation models (Me-LLaMA 13B and 70B) and their chat-enhanced versions, developed through comprehensive continual pretraining and instruction tuning of LLaMA2 models using both biomedical literature and clinical notes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!