GPUs are increasingly playing vital roles in the modern technology industry. Improving the GPU performance involves optimizing its architectural design and fine-tuning its software code. However, to achieve this, engineers must investigate codes from as many GPU-related applications as possible to identify code portions that need fine-tuning. Moreover, this effort requires engineers to have good domain knowledge, and their work is made more arduous because the source codes of applications are normally confidential. To this end, we introduce ShaderAnalyzer, a solution leveraging graph mining and machine learning to analyze GPU-executed low-level machine codes and identify their fine-tuning opportunities. Our approach includes representing machine code with graph structure and subsequently identifying frequently occurring substructures within the codes. Optimizing the execution of these substructures can enhance the overall performance of the GPU. In addition, our model leverages these frequent patterns to further facilitate engineers' tasks by selecting representative patterns to predict and investigate low-efficiency ones. We conduct comprehensive experiments to evaluate the performance of our solution, and the results have been validated by our industry partners. ShaderAnalyzer is an end-to-end framework that helps engineers identify code segments with the highest potential for performance gains after fine-tuning and offers valuable insights for hardware architects in future products design.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11473530 | PMC |
http://dx.doi.org/10.1038/s41598-024-68974-8 | DOI Listing |
PLoS One
January 2025
China Energy Dadu River Hydropower Development Co., Ltd., Chengdu, China.
Early warning of geological hazards requires monitoring extreme weather conditions, such as heavy rainfall. Atmospheric circulation models are used for weather forecasting and climate simulation. As a critical physical process in atmospheric circulation models, the Zhang-McFarlane (ZM) deep convective physical parameterization scheme involves computationally intensive calculations that significantly impact the overall operational efficiency of the model.
View Article and Find Full Text PDFPhys Med Biol
January 2025
Departamento de Fisica, Universidade de Aveiro, Campus Universitario de Santiago, 3810-193 Aveiro, Aveiro, 3810-193, PORTUGAL.
A new projector, Orthogonal-Distance Ray-tracer Varying-Full Width at Half Maximum (OD-RT-VF), was developed to model a shift-variant elliptical point-spread function (PSF) response to improve the image quality of a preclinical dual-rotation PET system. Approach: The OD-RT-VF projector models different FWHM values of the PSF in multiple directions, using half-height and half-width tube-of-response (ToR) values. The OD-RT-VF method's performance was evaluated against the original OD-RT method and a ToR model with constant response.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland.
We demonstrate high-resolution single-pixel imaging (SPI) in the visible and near-infrared wavelength ranges using an SPI framework that incorporates a novel, dedicated sampling scheme and a reconstruction algorithm optimized for the rapid imaging of highly sparse scenes at the native digital micromirror device (DMD) resolution of 1024 × 768. The reconstruction algorithm consists of two stages. In the first stage, the vector of SPI measurements is multiplied by the generalized inverse of the measurement matrix.
View Article and Find Full Text PDFPolymers (Basel)
December 2024
Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory, 1, 119991 Moscow, Russia.
The scope of this work was to develop a thin-film composite (TFC) membrane for the separation of CO/CO mixtures, which are relevant for many processes of gas processing and gasification of carbon-based feedstock. Special attention was given to the development of highly permeable porous polysulfone (PSF) supports (more than 26,000 GPU for CO) since both the selective and support layers contribute significantly to the overall performance of the TFC membrane. The PSF porous support is widely used in commercial and lab-scale TFC membranes, and its porous structure and other exploitation parameters are set during the non-solvent-induced phase separation (NIPS) process.
View Article and Find Full Text PDFRecent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown significant potential in medical applications, but their effectiveness is limited by a lack of specialized medical knowledge due to general-domain training. In this study, we developed Me-LLaMA, a new family of open-source medical LLMs that uniquely integrate extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA comprises foundation models (Me-LLaMA 13B and 70B) and their chat-enhanced versions, developed through comprehensive continual pretraining and instruction tuning of LLaMA2 models using both biomedical literature and clinical notes.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!