YU Cheng-long, SUN Yue, GUO Jin-xing, et al. Research on Stroke Risk Prediction for Imbalanced Datasets[J]. New Generation of Information Technology, 2024, 7(3): 18-22
YU Cheng-long, SUN Yue, GUO Jin-xing, et al. Research on Stroke Risk Prediction for Imbalanced Datasets[J]. New Generation of Information Technology, 2024, 7(3): 18-22 DOI: 10.3969/j.issn.2096-6091.2024.03.004.
Research on Stroke Risk Prediction for Imbalanced Datasets
In order to solve the problem of data imbalance in stroke risk prediction
the purpose of this study is to improve the model's ability to identify minority classes by data expansion method. SMOTE is used to expand the training data
and the experiment is carried out with logistic regression
support vector machine
decision tree
K
-nearest neighbor
random forest
gradient elevator
XGBoost and other machine learning models. The effect of SMOTE technique on the predictive power of the model is evaluated by comparing the performance of the model on the original unbalanced data set and the SMOTE treated data set.The experimental results show that on the original unbalanced data set
the model is generally difficult to identify a few classes. In the dataset treated by SMOTE
the accuracy
G-mean and
F
1
values of each model have improved significantly. In particular
tree-based models and integration methods show h
igher effectiveness in dealing with unbalanced data.
SHETH S A , GIANCARDO L , COLASURDO M , et al . Machine learning and acute stroke imaging [J ] . Journal of NeuroInterventional Surgery , 2023 , 15 ( 2 ): 195 - 199 ..
WU Y F , FANG Y . Stroke prediction with machine learning methods among older Chinese [J ] . International Journal of Environmental Research and Public Health , 2020 , 17 ( 6 ): 1828 .
ESPÍNDOLA R P , EBECKEN N F F . On extending F-measure and G-mean metrics to multi-class problems [J ] . WIT Transactions on Information and Communication Technologies , 2005 , 35 : 25 - 34 .
VONG C M , DU J , WONG C M , et al . Postboosting using extended G-mean for online sequential multiclass imbalance learning [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2018 , 29 ( 12 ): 6163 - 6177 .
ELREEDY D , ATIYA A F . A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance [J ] . Information Sciences , 2019 , 505 : 32 - 64 .
GUO H P , LIU H B , WU C G , et al . Logistic discrimination based on G-mean and F-measure for imbalanced problem [J ] . Journal of Intelligent & Fuzzy Systems , 2016 , 31 ( 3 ): 1155 - 1166 .