RT Journal Article SR Electronic T1 Leveraging multivariate analysis and adjusted mutual information to improve stroke prediction and interpretability JF Neurosciences Journal JO Neurosciences (Riyadh) FD Prince Sultan Military Medical City SP 190 OP 196 DO 10.17712/nsj.2024.3.20230100 VO 29 IS 3 A1 Aboonq, Moutasem S. A1 Alqahtani, Saeed A. YR 2024 UL http://nsj.org.sa/content/29/3/190.abstract AB Objectives: To develop a machine learning model to accurately predict stroke risk based on demographic and clinical data. It also sought to identify the most significant stroke risk factors and determine the optimal machine learning algorithm for stroke prediction.Methods: This cross-sectional study analyzed data on 438,693 adults from the 2021 Behavioral Risk Factor Surveillance System. Features encompassed demographics and clinical factors. Descriptive analysis profiled the dataset. Logistic regression quantified risk relationships. Adjusted mutual information evaluated feature importance. Multiple machine learning models were built and evaluated on metrics like accuracy, AUC ROC, and F1 score.Results: Key factors significantly associated with higher stroke odds included older age, diabetes, hypertension, high cholesterol, and history of myocardial infarction or angina. Random forest model achieved the best performance with accuracy of 72.46%, AUC ROC of 0.72, and F1 score of 0.74. Cross-validation confirmed its reliability. Top features were hypertension, myocardial infarction history, angina, age, diabetes status, and cholesterol.Conclusion: The random forest model robustly predicted stroke risk using demographic and clinical variables. Feature importance highlighted priorities like hypertension and diabetes for clinical monitoring and intervention. This could help enable data-driven stroke prevention strategies.