Background  The generation of innovative research ideas is crucial to advancing the field of medicine. As physicians face increasingly demanding clinical schedules, it is important to identify tools that may expedite the research process. Artificial intelligence may offer a promising solution by enabling the efficient generation of novel research ideas. This study aimed to assess the feasibility of using artificial intelligence to build upon existing knowledge by generating innovative research questions.  Methods  A comparative evaluation study was conducted to assess the ability of AI models to generate novel research questions. The prompt "research ideas for adolescent idiopathic scoliosis" was input into ChatGPT 3.5, Gemini 1.5, Copilot, and Llama 3. This resulted in an output of several research questions ranging from 10 questions to 14 questions. A keyword-friendly modified version of the AI-generated responses was searched in the PubMed database. Results were limited to manuscripts published in the English language from the year 2000 to the present. Each response was then cross-referenced to the PubMed search results and assigned an originality score of 0-5, with 0 being the most original and 5 being not original at all, by adding one numerical value for each paper already published on the topic. The mean originality scores were calculated manually by summing the originality scores from all the responses from each AI model and then dividing that sum by the respective number of prompts generated by the AI. The standard deviation of the originality scores for each AI was calculated using the standard deviation function (STDEV) function in Google Sheets (Google, Mountain View, California). Each AI was also evaluated on its percent novelty, the percentage of total generated responses that yielded an originality score of 0 when searched in PubMed.  Results  Each AI produced varying numbers of research prompts that were inputted into PubMed. The mean originality scores for ChatGPT, Gemini, Copilot, and Llama were 4.2 ± 1.9, 4.1 ± 1.3, 4.0 ± 1.6, and 3.8 ± 1.7, respectively. Of ChatGPT's 12 prompts, 16.67% were completely novel (no prior research had been conducted on the topic provided by the AI model). 10.00% of Copilot's 10 prompts were completely novel, and 8.33% of Llama's 12 prompts were completely novel. None of Gemini's fourteen responses yielded an originality score of 0.  Conclusions  Our findings demonstrate that ChatGPT, Llama, and Copilot are capable of generating novel ideas in orthopaedics research. As these models continue to evolve and become even more refined with time, physicians and scientists should consider incorporating them when brainstorming and planning their research studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673347PMC
http://dx.doi.org/10.7759/cureus.74574DOI Listing

Publication Analysis

Top Keywords

originality scores
16
novel ideas
12
artificial intelligence
12
originality score
12
completely novel
12
ideas adolescent
8
adolescent idiopathic
8
chatgpt gemini
8
gemini copilot
8
copilot llama
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!