Comparing Large Language Models and Human Programmers for Generating Programming Code.

Adv Sci (Weinh)

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, 07024, USA.

Published: December 2024

The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT-4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT-4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT-4 is comparable to that of human programmers. GPT-4 is also capable of handling broader programming tasks, including front-end design and database operations. These results suggest that GPT-4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming.

Download full-text PDF

Source
http://dx.doi.org/10.1002/advs.202412279DOI Listing

Publication Analysis

Top Keywords

programming code
12
large language
8
language models
8
human programmers
8
programming
8
generating programming
8
prompt strategies
8
programming languages
8
optimal prompt
8
prompt strategy
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!