img
img
The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro       
Yazarlar (6)
Mehmet Cem SABANER
Kastamonu Üniversitesi, Türkiye
Arzu Seyhan Karatepe Hashas
Sahlgrenska Universitetssjukhuset, Sweden
Kemal Mert Mutibayraktaroglu
South Älvsborg Hospital, Sweden
Dr. Öğr. Üyesi Zübeyir YOZGAT Dr. Öğr. Üyesi Zübeyir YOZGAT
Kastamonu Üniversitesi, Türkiye
Oliver Niels Klefter
Rigshospitalet, Denmark
Yousif Subhi
Rigshospitalet, Denmark
Devamını Göster
Özet
Purpose: To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“kunskapsprov för läkare”) exams. Design: Observational study. Methods: The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked. Results: A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) (p = 0.13, and p = 0.04, respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section (n = 47) compared to Gemini 1.5 Pro across all three attempts in English (p < 0.05). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs. Conclusion: Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.
Anahtar Kelimeler
Artificial intelligence | ChatGPT-4 omni | E-learning | Gemini 1.5 Pro | Large language model | Medical education | Ophthalmology
Makale Türü Özgün Makale
Makale Alt Türü SCOPUS dergilerinde yayımlanan tam makale
Dergi Adı AJO International
Dergi ISSN 2950-2535
Dergi Tarandığı Indeksler Scopus
Makale Dili Türkçe
Basım Tarihi 12-2024
Cilt No 1
Sayı 4
Doi Numarası 10.1016/j.ajoint.2024.100070
Makale Linki https://doi.org/10.1016/j.ajoint.2024.100070