PT - JOURNAL ARTICLE AU - Aboonq, Moutasem S. AU - Alqahtani, Saeed A. TI - Leveraging multivariate analysis and adjusted mutual information to improve stroke prediction and interpretability AID - 10.17712/nsj.2024.3.20230100 DP - 2024 Jul 01 TA - Neurosciences Journal PG - 190--196 VI - 29 IP - 3 4099 - http://nsj.org.sa/content/29/3/190.short 4100 - http://nsj.org.sa/content/29/3/190.full SO - Neurosciences (Riyadh)2024 Jul 01; 29 AB - Objectives: To develop a machine learning model to accurately predict stroke risk based on demographic and clinical data. It also sought to identify the most significant stroke risk factors and determine the optimal machine learning algorithm for stroke prediction.Methods: This cross-sectional study analyzed data on 438,693 adults from the 2021 Behavioral Risk Factor Surveillance System. Features encompassed demographics and clinical factors. Descriptive analysis profiled the dataset. Logistic regression quantified risk relationships. Adjusted mutual information evaluated feature importance. Multiple machine learning models were built and evaluated on metrics like accuracy, AUC ROC, and F1 score.Results: Key factors significantly associated with higher stroke odds included older age, diabetes, hypertension, high cholesterol, and history of myocardial infarction or angina. Random forest model achieved the best performance with accuracy of 72.46%, AUC ROC of 0.72, and F1 score of 0.74. Cross-validation confirmed its reliability. Top features were hypertension, myocardial infarction history, angina, age, diabetes status, and cholesterol.Conclusion: The random forest model robustly predicted stroke risk using demographic and clinical variables. Feature importance highlighted priorities like hypertension and diabetes for clinical monitoring and intervention. This could help enable data-driven stroke prevention strategies.