1.牡丹江医学院附属红旗医院,黑龙江 牡丹江,157011
2.牡丹江医学院附属第二医院,黑龙江 牡丹江,157011
于成龙 ,男,1989.3-,硕士研究生,主治医师,研究方向:医学人工智能
孙悦 ,女,1992.2-,本科,护师,研究方向:医学人工智能
郭金兴 ,女,1984.5-,硕士研究生,主管护师,研究方向:医学人工智能
才莹 , 女,1996.2-,主治医师,研究生学历,研究方向:临床预测模型。
冯莹,女,1989.5-,硕士研究生,主治医师,研究方向:医学人工智能。E-mail:44642581@qq.com
网络出版日期:2024-05-31,
扫 描 看 全 文
于成龙,孙悦,郭金兴等.面向不平衡数据集的脑卒中风险预测研究[J].新一代信息技术,
Yu Chenglong,Sun Yue,Guo Jinxing,et al.Research on Stroke Risk Prediction for Imbalanced Datasets[J].New Generation of Information Technology,
于成龙,孙悦,郭金兴等.面向不平衡数据集的脑卒中风险预测研究[J].新一代信息技术, DOI:10.3969/j.issn.2096-6091.XXXX.XX.001.
Yu Chenglong,Sun Yue,Guo Jinxing,et al.Research on Stroke Risk Prediction for Imbalanced Datasets[J].New Generation of Information Technology, DOI:10.3969/j.issn.2096-6091.XXXX.XX.001.
目的
2
为解决脑卒中风险预测中数据不平衡的问题,本文旨在通过数据扩充方法提高模型对少数类的识别能力。
方法
2
采用合成少数类过采样技术(SMOTE)对训练数据进行扩充,并使用逻辑回归、支持向量机、决策树、K近邻、随机森林、梯度提升机、XGBoost等多种机器学习模型进行实验。通过比较在原始不平衡数据集和经过SMOTE处理的数据集上的模型性能,评估了SMOTE技术对模型预测能力的影响。
结论
2
实验结果显示,在原始不平衡数据集上,模型普遍难以识别少数类。而在经过SMOTE处理的数据集上,各模型的准确率、G-mean和F1值等性能指标均显著提高。特别是基于树的模型和集成方法在处理不平衡数据时表现出更高的有效性。
Objective
2
In order to solve the problem of data imbalance in stroke risk prediction
the purpose of this study was to improve the model's ability to identify minority classes by data expansion method.
Methods
2
The synthetic minority oversampling technique (SMOTE) was used to expand the training data
and the experiment was carried out with logistic regression
support vector machine
decision tree
K-nearest neighbor
random forest
gradient elevator
XGBoost and other machine learning models. The effect of SMOTE technique on the predictive power of the model is evaluated by comparing the performance of the model on the original unbalanced data set and the SMOTE treated data set.
Conclusion
2
The experimental results show that on the original unbalanced data set
the model is generally difficult to identify a few classes. In the dataset treated by SMOTE
the accuracy
G-mean and F1 values of each model have improved significantly. In particular
tree-based models and integration methods show higher effectiveness in dealing with unbalanced data.
不平衡数据集SMOTE集成学习非集成学习疾病预防
Imbalanced datasetsSMOTEEnsemble learningNon-ensemble learningDisease prevention
计西洋.急性缺血性脑卒中单纯血管内治疗与桥接治疗的安全性及预后研究[D].延安大学,2023.
陈琳,杨淼.脑卒中患者复发风险感知研究进展[J].安徽医学,2024,45(01):113-116.
王晓霞,李雷孝,林浩.SMOTE类算法研究综述[J/OL].计算机科学与探索,1-29[2024-02-16].
谭倩梅,杨静,李秋萍.基于SMOTE算法的急性一氧化碳中毒迟发性脑病风险预警模型的构建[J].护理管理杂志,2023,23(09):760-763+780.
李瑞平,朱俊杰.基于改进Borderline-Smote-GBDT的冠心病预测[J].中国医学物理学杂志,2023,40(10):1278-1284.
WU Y, FANG Y. Stroke prediction with machine learning methods among older Chinese[J]. International journal of environmental research and public health, 2020, 17(6): 1828.
ESPÍNDOLA R P, EBECKEN N F F. On extending f-measure and g-mean metrics to multi-class problems[J]. WIT Transactions on Information and Communication Technologies, 2005, 35: 25-34.
VONG C M, DU J, WONG C M, et al. Postboosting using extended G-mean for online sequential multiclass imbalance learning[J]. IEEE transactions on neural networks and learning systems, 2018, 29(12): 6163-6177.
ELREEDY D, ATIYA A F. A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance[J]. Information Sciences, 2019, 505: 32-64.
GUO H, LIU H, WU C, et al. Logistic discrimination based on G-mean and F-measure for imbalanced problem[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(3): 1155-1166.
王伟,谢耀滨,尹青.针对不平衡数据的决策树改进方法[J].计算机应用,2019,39(03):623-628.
张鹏.面向不平衡数据的分类技术研究及应用[D].山西财经大学,2021.
房晓南.基于半监督和集成学习的不平衡数据特征选择和分类[D].山东师范大学,2016.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构