The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

SABANER, Mehmet Cem; Haşhaş, Arzu Seyhan Karatepe; Mutibayraktaroğlu, Kemal Mert; YOZGAT, Zübeyir; Klefter, Oliver Niels; Subhi, Yousif

doi:10.1016/j.ajoint.2024.100070

The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

Yazarlar (6)
Mehmet Cem Sabaner Kastamonu Üniversitesi, Türkiye
Arzu Seyhan Karatepe Haşhaş Türkiye
Kemal Mert Mutibayraktaroğlu Türkiye
Doç. Dr. Zübeyir YOZGAT Kurum Bilgileri Tıp Fakültesi Cerrahi Tıp Bilimleri Göz Hastalıkları Özgeçmiş Sayfası İletişim Bilgileri: (366)280-7201 Kastamonu Üniversitesi, Türkiye
Oliver Niels Klefter
Yousif Subhi

Devamını Göster

Özet

Purpose: To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“kunskapsprov för läkare”) exams. Design: Observational study. Methods: The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked. Results: A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) (p = 0.13, and p = 0.04, respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section (n = 47) compared to Gemini 1.5 Pro across all three attempts in English (p < 0.05). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs. Conclusion: Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.

Anahtar Kelimeler

Makale Türü	Özgün Makale
Makale Alt Türü	SCOPUS dergilerinde yayınlanan tam makale
Dergi Adı	AJO International
Dergi ISSN	2950-2535 Scopus Dergi
Dergi Tarandığı Indeksler	Scopus
Makale Dili	İngilizce
Basım Tarihi	12-2024
Cilt No	1
Sayı	4
Sayfalar	100070 / 0
Doi Numarası	10.1016/j.ajoint.2024.100070
Makale Linki	https://doi.org/10.1016/j.ajoint.2024.100070

BM Sürdürülebilir Kalkınma Amaçları

Atıf Sayıları
SCOPUS	7
Google Scholar	8

The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

Akademisyenler > Zübeyir YOZGAT > Yayın Detayı

The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

Paylaş