TY - JOUR
T1 - Machine learning to predict stroke risk from routine hospital data
T2 - A systematic review
AU - Heseltine-Carp, William
AU - Courtman, Megan
AU - Browning, Daniel
AU - Kasabe, Aishwarya
AU - Allen, Michael
AU - Streeter, Adam
AU - Ifeachor, Emmanuel
AU - James, Martin
AU - Mullin, Stephen
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/4/1
Y1 - 2025/4/1
N2 - Purpose: Stroke remains a leading cause of morbidity and mortality. Despite this, current risk stratification tools such as CHA2DS2-VASc and QRISK3 are of limited accuracy, particularly in those without a diagnosis of atrial-fibrillation. Hence, there is a need for more accurate stroke risk prediction models. Machine-learning (ML) may provide a solution to this by leveraging existing routine hospital databases to build accurate stroke risk prediction models and identify novel risk factors for stroke. Aims: In this systematic review we appraise current research using ML to predict stroke risk from routine hospital data. Based on these findings we then highlight common methodological limitations and recommendations for future research. Methods: In this review we identify 49 original research (38 in the general population and 11 in AF specific populations) articles from the PUBMED database from January-2013 to December-2024 using ML and routine hospital data to predict the risk of stroke. Results: ML models were able to accurately predict stroke risk in both AF specific and general populations, with AUCs ranging from 0.64 to 0.99. Where tested, ML also consistently outperformed traditional risk stratification tool, such as CHA2DS2-VASc. ML also appeared useful in identifying several novel risk factors from electrocardiogram, laboratory test and echocardiography data. However, the quality of datasets were often limited, there was a high suspicion of overfitting and models often lacked calibration, external validation and explainability analysis. Conclusion: Whilst ML has shown great potential in stroke prediction and identifying novel risk factors for stroke, improvements in study methodology is required prior to integration of ML into routine healthcare. Future research should adhere to the EQUATOR guidance on prediction models and encourage interdisciplinary collaboration between computer scientists and clinicians. Further prospective RCTs are also required to validate models in the clinical setting and the identify barriers of integrating ML into routine healthcare.
AB - Purpose: Stroke remains a leading cause of morbidity and mortality. Despite this, current risk stratification tools such as CHA2DS2-VASc and QRISK3 are of limited accuracy, particularly in those without a diagnosis of atrial-fibrillation. Hence, there is a need for more accurate stroke risk prediction models. Machine-learning (ML) may provide a solution to this by leveraging existing routine hospital databases to build accurate stroke risk prediction models and identify novel risk factors for stroke. Aims: In this systematic review we appraise current research using ML to predict stroke risk from routine hospital data. Based on these findings we then highlight common methodological limitations and recommendations for future research. Methods: In this review we identify 49 original research (38 in the general population and 11 in AF specific populations) articles from the PUBMED database from January-2013 to December-2024 using ML and routine hospital data to predict the risk of stroke. Results: ML models were able to accurately predict stroke risk in both AF specific and general populations, with AUCs ranging from 0.64 to 0.99. Where tested, ML also consistently outperformed traditional risk stratification tool, such as CHA2DS2-VASc. ML also appeared useful in identifying several novel risk factors from electrocardiogram, laboratory test and echocardiography data. However, the quality of datasets were often limited, there was a high suspicion of overfitting and models often lacked calibration, external validation and explainability analysis. Conclusion: Whilst ML has shown great potential in stroke prediction and identifying novel risk factors for stroke, improvements in study methodology is required prior to integration of ML into routine healthcare. Future research should adhere to the EQUATOR guidance on prediction models and encourage interdisciplinary collaboration between computer scientists and clinicians. Further prospective RCTs are also required to validate models in the clinical setting and the identify barriers of integrating ML into routine healthcare.
KW - Artificial intelligence
KW - Ischaemic stroke
KW - Machine learning
KW - Risk evaluation
KW - Routine hospital data
KW - Stroke
UR - http://www.scopus.com/inward/record.url?scp=85216758023&partnerID=8YFLogxK
UR - https://pearl.plymouth.ac.uk/context/secam-research/article/3134/viewcontent/1_s2.0_S1386505625000280_main.pdf
U2 - 10.1016/j.ijmedinf.2025.105811
DO - 10.1016/j.ijmedinf.2025.105811
M3 - Review article
C2 - 39908727
AN - SCOPUS:85216758023
SN - 1386-5056
VL - 196
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 105811
ER -