Real Customization or Just Marketing: Are Customized Versions of Generative AI Useful?

Eduardo C Garrido-Merchán Jose Luis Arroyo-Barrigüete Francisco Borrás-Pala Leandro Escobar-Torres Carlos Martínez de Ibarreta Jose María Ortíz-Lozano Antonio Rua-Vieites

F1000Res

Universidad Pontificia Comillas, Madrid, Community of Madrid, Spain.

Published: October 2024

Background: Large Language Models (LLMs), as in the case of OpenAI ChatGPT-4 Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.

Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from "Statistics and Probability" and "Business Statistics" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.

Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.

Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447677	PMC
http://dx.doi.org/10.12688/f1000research.153129.2	DOI Listing

Publication Analysis

Top Keywords

chatgpt-4 turbo

quality depth

chatgpt-4

bsvp

real customization

customization marketing

marketing customized

customized versions

versions generative

generative useful?

Similar Publications

Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study.

PLoS One

January 2025

Faculty of Dentistry, PHENIKAA University, Hanoi, Vietnam.

Huy Cong Nguyen Hai Phong Dang Thuy Linh Nguyen Viet Hoang Viet Anh Nguyen

Objectives: This study aims to evaluate the performance of the latest large language models (LLMs) in answering dental multiple choice questions (MCQs), including both text-based and image-based questions.

Material And Methods: A total of 1490 MCQs from two board review books for the United States National Board Dental Examination were selected. This study evaluated six of the latest LLMs as of August 2024, including ChatGPT 4.

View Article and Find Full Text PDF

Similar Publications

The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.

World J Mens Health

December 2024

Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei, Taiwan.

Lun-Hsiang Yuan Shi-Wei Huang Dean Chou Chung-You Tsai

Purpose: Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment. This study aims to evaluate the performance of four general-purpose large language model (LLMs) in IR and RA tasks.

Materials And Methods: We conducted a study using simulated text reports from computed tomography, magnetic resonance imaging, bone scans, and biopsy pathology on stage IV PC patients.

View Article and Find Full Text PDF

Similar Publications

ChatGPT-4 Turbo and Meta's LLaMA 3.1: A Relative Analysis of Answering Radiology Text-Based Questions.

Cureus

November 2024

Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.

Mohammed Abdul Sami Mohammed Abdul Samad Keyur Parekh Pokhraj P Suthar

Aims And Objectives: This study aimed to compare the accuracy of two AI models - OpenAI's GPT-4 Turbo (San Francisco, CA) and Meta's LLaMA 3.1 (Menlo Park, CA) - when answering a standardized set of pediatric radiology questions. The primary objective was to evaluate the overall accuracy of each model, while the secondary objective was to assess their performance within subsections.

View Article and Find Full Text PDF

Similar Publications

Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.

Clin Anat

November 2024

College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia.

Volodymyr Mavrych Paul Ganguly Olena Bolgova

The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including medical education, raises questions about their accuracy. The primary aim of our study was to undertake a detailed comparative analysis of the proficiencies and accuracies of six different LLMs (ChatGPT-4, ChatGPT-3.5-turbo, ChatGPT-3.

View Article and Find Full Text PDF

Similar Publications

Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams.

Sci Rep

November 2024

Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.

Soolmaz Zare Soheil Vafaeian Mitra Amini Keyvan Farhadi Mohammadreza Vali

This study aims to measure the performance of different AI-language models in three sets of pre-internship medical exams and to compare their performance with Iranian medical students. Three sets of Persian pre-internship exams were used, along with their English translation (six sets in total). In late September 2023, we sent requests to ChatGPT-3.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!