Jesper Tveit, PhD; Harald Aurlien, MD, PhD; Sergey Plis, PhD; Vince D. Calhoun, PhD; William O. Tatum, DO;
Donald L. Schomer, MD; Vibeke Arntsen, MD; Fieke Cox, MD, PhD; Firas Fahoum, MD; William B. Gallentine, DO; Elena Gardella, MD, PhD; Cecil D. Hahn, MD; Aatif M. Husain, MD; Sudha Kessler, MD;
Mustafa Aykut Kural, MD, PhD; Fábio A. Nascimento, MD; Hatice Tankisi, MD, PhD; Line B. Ulvin, MD;
Richard Wennberg, MD, PhD; Sándor Beniczky, MD, PhD
2023년 6월 20일
JAMA Neurol. 2023;80(8):805-812.
doi:10.1001/jamaneurol.2023.1645
Published online June 20, 2023.
Abstract
Importance
Electroencephalograms (EEGs) are a fundamental evaluation in neurology but require special expertise unavailable in many regions of the world. Artificial intelligence (AI) has a potential for addressing these unmet needs. Previous AI models address only limited aspects of EEG interpretation such as distinguishing abnormal from normal or identifying epileptiform activity. A comprehensive, fully automated interpretation of routine EEG based on AI suitable for clinical practice is needed.
Objective
To develop and validate an AI model (Standardized Computer-based Organized Reporting of EEG–Artificial Intelligence [SCORE-AI]) with the ability to distinguish abnormal from normal EEG recordings and to classify abnormal EEG recordings into categories relevant for clinical decision-making: epileptiform-focal, epileptiform-generalized, nonepileptiform-focal, and nonepileptiform-diffuse.
Design, Setting, and Participants
In this multicenter diagnostic accuracy study, a convolutional neural network model, SCORE-AI, was developed and validated using EEGs recorded between 2014 and 2020. Data were analyzed from January 17, 2022, until November 14, 2022. A total of 30 493 recordings of patients referred for EEG were included into the development data set annotated by 17 experts. Patients aged more than 3 months and not critically ill were eligible. The SCORE-AI was validated using 3 independent test data sets: a multicenter data set of 100 representative EEGs evaluated by 11 experts, a single-center data set of 9785 EEGs evaluated by 14 experts, and for benchmarking with previously published AI models, a data set of 60 EEGs with external reference standard. No patients who met eligibility criteria were excluded.
Main Outcomes and Measures
Diagnostic accuracy, sensitivity, and specificity compared with the experts and the external reference standard of patients’ habitual clinical episodes obtained during video-EEG recording.
Results
The characteristics of the EEG data sets include development data set (N = 30 493; 14 980 men; median age, 25.3 years [95% CI, 1.3-76.2 years]), multicenter test data set (N = 100; 61 men, median age, 25.8 years [95% CI, 4.1-85.5 years]), single-center test data set (N = 9785; 5168 men; median age, 35.4 years [95% CI, 0.6-87.4 years]), and test data set with external reference standard (N = 60; 27 men; median age, 36 years [95% CI, 3-75 years]). The SCORE-AI achieved high accuracy, with an area under the receiver operating characteristic curve between 0.89 and 0.96 for the different categories of EEG abnormalities, and performance similar to human experts. Benchmarking against 3 previously published AI models was limited to comparing detection of epileptiform abnormalities. The accuracy of SCORE-AI (88.3%; 95% CI, 79.2%-94.9%) was significantly higher than the 3 previously published models (P < .001) and similar to human experts.
Conclusions and Relevance
In this study, SCORE-AI achieved human expert level performance in fully automated interpretation of routine EEGs. Application of SCORE-AI may improve diagnosis and patient care in underserved areas and improve efficiency and consistency in specialized epilepsy centers.
Key Points
Question Can an artificial intelligence (AI) model be trained to interpret routine clinical electroencephalograms (EEGs) with accuracy equivalent to that of human experts?
Findings In this diagnostic study, an AI model (SCORE-AI) was trained on 30 493 EEGs to separate normal from abnormal recordings then classify abnormal recordings as epileptiform-focal, epileptiform-generalized, nonepileptiform-focal, or nonepileptiform-diffuse. The SCORE-AI was validated using 3 independent test data sets consisting of 9945 EEGs not used for training; SCORE-AI achieved diagnostic accuracy similar to human experts.
Meaning Results of this study suggest that application of SCORE-AI may have utility in improving patient care in underserved areas and efficiency and consistency in specialized centers.
요약
소개
전기뇌파검사(EEG)는 신경학적으로 기본적인 평가 방법이지만, 세계의 많은 지역에서는 이러한 검사를 수행할 수 있는 전문 지식이 아직 부족하다. 이전의 AI 모델들은 EEG 해석의 제한된 측면만을 다루었었는데, 예를 들어 정상과 비정상을 구별하거나 간질 발작 활동을 식별하는 데 중점을 두었다. 따라서 임상에서 사용할 수 있는 일상적인 EEG의 포괄적이고 완전 자동화된 해석이 필요하여 본 연구가 진행되었다.
본 연구는 EEG 분석 AI 모델(표준화된 컴퓨터 기반 EEG 조직화 [SCORE-AI])을 개발하고 검증하여, 비정상적인 EEG 기록과 정상 기록을 구별할 수 있는 능력을 갖추고, 비정상 EEG 기록을 임상 의사결정에 중요한 카테고리로 분류하는 것을 목표로 한다.
방법
SCORE-AI는 2014년부터 2020년 사이에 기록된 EEG를 사용하여 개발되고 검증되었다. 총 30,493개의 EEG 기록이 포함되었으며, 이는 EEG를 의뢰받은 환자들로부터 수집된 것으로, 17명의 전문가에 의해 카테고리에 분류되었다.
비디오-EEG 기록을 분석한 것을 SCORE-AI와 사람 전문가를 비교하여 진단 정확도, 민감도, 특이도를 평가하였다.
결과
SCORE-AI는 높은 정확도를 보였으며, 성능은 사람 전문가와 유사했다.
이전에 개발된 3개의 AI 모델과의 비교했을 때, SCORE-AI의 정확도(88.3%; 95% CI, 79.2%-94.9%)는 이전의 3개 모델보다 유의미하게 높았고(P < .001), 인간 전문가와 유사했다.
결론 및 의의
이번 연구에서 SCORE-AI는 일상적인 EEG의 완전 자동 해석에서 인간 전문가 수준의 성능을 달성했다. SCORE-AI의 활용은 자원이 부족한 지역에서 진단 및 환자 치료를 개선하고, 전문 간질 센터에서 효율성과 일관성을 높이는 데 기여할 수 있을 것으로 기대된다.
#Electroencephalogram, #AI, #EEG