Evaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

Koyun, Mustafa; TAŞKENT, İsmail

doi:10.3390/jcm14020571

Evaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

Yazarlar (2)

Prof. Dr. Mustafa Koyun Kastamonu Training And Research Hospital, Türkiye

Doç. Dr. İsmail TAŞKENT Kastamonu Üniversitesi, Türkiye

Makale Türü	Özgün Makale (SSCI, AHCI, SCI, SCI-Exp dergilerinde yayınlanan tam makale)
Dergi Adı	Journal of Clinical Medicine (Q1)
Dergi ISSN	2077-0383 Dergi Bilgileri (2025)
Dergi Tarandığı Indeksler	SCI-Expanded
Makale Dili	İngilizce	Basım Tarihi	01-2025
Cilt / Sayı / Sayfa	14 / 2 / 571–0	DOI	10.3390/jcm14020571
Makale Linki	https://doi.org/10.3390/jcm14020571
UAK Araştırma Alanları	Radyoloji

Özet

Background/Objectives Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). Methods The DWI images of a total of 110 cases (AIS group: n = 55, healthy controls: n = 55) were provided to the AI models via standardized prompts. The models’ responses were compared to radiologists’ gold-standard evaluations, and performance metrics such as sensitivity, specificity, and diagnostic accuracy were calculated. Results Both models exhibited a high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated a significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). The agreement with radiologists was poor for ChatGPT-4o (κ = 0.036; %95 CI: −0.013, 0.085) but good for Claude 3.5 Sonnet (κ = 0.691; %95 CI: 0.558, 0.824). In terms of the AIS hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.3%), with these differences being statistically significant (p < 0.05). Conclusions This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages …

Anahtar Kelimeler

BM Sürdürülebilir Kalkınma Amaçları

Atıf Sayıları
Web of Science	19
Scopus	20
Google Scholar	30

Evaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

Dergi Adı	Journal of Clinical Medicine
Kısa Adı	J CLIN MED
Yayıncı	MDPI
Açık Erişim	Evet
ISSN	2077-0383
E-ISSN	2077-0383
Wos Quartile	Q1
Scopus Quartile	Q2
Tarandığı Indeksler	SCIE , Scopus
WoS Kategoriler	MEDICINE, GENERAL & INTERNAL
Scopus Kategoriler	MEDICINE (MISCELLANEOUS)

Evaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

Paylaş