浏览全部资源
扫码关注微信
1.重庆城市管理职业学院,重庆 400044
2.重庆大学计算机学院,重庆 400044
李林 (1995—),男,硕士研究生,助教,研究方向:声纹识别与智能装备研究。
张程 (1979—),男,博士研究生,副教授,研究方向:大数据挖掘应用、知识推理研究。
纸质出版日期:2023-09-15
移动端阅览
李林, 张程. 基于SE-B-ResNet-50的声纹识别方法研究[J]. 新一代信息技术, 2023, 6(17): 01-07
LI Lin, ZHANG Cheng. Research on Voiceprint Recognition Method Based on SE-B-ResNet-50[J]. New Generation of Information Technology, 2023, 6(17): 01-07
李林, 张程. 基于SE-B-ResNet-50的声纹识别方法研究[J]. 新一代信息技术, 2023, 6(17): 01-07 DOI: 10.3969/j.issn.2096-6091.2023.17.001.
LI Lin, ZHANG Cheng. Research on Voiceprint Recognition Method Based on SE-B-ResNet-50[J]. New Generation of Information Technology, 2023, 6(17): 01-07 DOI: 10.3969/j.issn.2096-6091.2023.17.001.
针对传统声纹识别方法识别率低、方法实现过程繁琐复杂等问题,本文提出了一种基于SE-B-ResNet-50的声纹识别方法。该方法以ResNet-50为基础模型,首先结合声纹特征对模型第一层进行优化,同时在模型第一层与其他层之间增加了全局性跨尺度连接,然后在该模型基础上融入SE-Net方法,利用对网络中的特征通道建立依赖,并利用全局信息来增强有用特征,同时抑制无用特征,通过B-ResNet和SE-Net结合的特征提取方法获得深度声纹特征。实验结果表明,采用SE-B-ResNet-50的声纹识别方法的识别准确率达到了97%以上,远高于基线方法ResNet-50。
Aiming at the problems of low recognition rate of traditional voiceprint recognition methods and complicated implementation process
a voiceprint recognition method based on SE-B-ResNet-50 is proposed. The method is based on ResNet-50. First
the first layer of the model is optimized by combining the voiceprint features. At the same time
a global cross-scale connection is added between the first layer of the model and other layers
and then SE-Net is integrated on the basis of the model. The method used the establishment of dependencies on the feature channels in the network
used global information to enhance useful features
while suppressing useless features
and obtains deep voiceprint features through the feature extraction method combined with B-ResNet and SE-Net. The experimental results show that the recognition accuracy of the voiceprint recognition method using SE-B-ResNet-50 reaches more than 97%
which is much higher than the baseline method ResNet-50.
张芝旖 . 声纹识别相关技术研究及应用 [D ] . 南京 : 南京航空航天大学 , 2016 : 12 - 13 .
REYNOLDS D A . Speaker identification and verification using Gaussian mixture speaker models [J ] . Speech Communication , 1995 , 17 ( 1/2 ): 91 - 108 .
QUATIERI T F , DUNN R B , REYNOLDS D A , et al . Speaker recognition using G.729 speech CODEC parameters [C ] // 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing . Proceedings . Piscataway : IEEE , 2000 : II1089-II1092.
DEHAK N , KENNY P J , DEHAK R , et al . Front-end factor analysis for speaker verification [J ] . IEEE/ACM Transactions on Audio Speech and Language Processing , 2011 , 19 ( 4 ): 788 - 798 .
LEI Y , SCHEFFER N , FERRER L , et al . A novel scheme for speaker recognition using a phonetically-aware deep neural network [C ] // 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2014 : 25 - 35 .
SCHMIDHUBER J . Deep learning in neural networks: An overview [J ] . Neural Networks , 2015 , 61 : 85 - 117 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2015-04-10 )[ 2023-10-25 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .
HEIGOLD G , MORENO I , BENGIO S , et al . End-to-end text-dependent speaker verification [C ] // 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2016 : 5115 - 5119 .
BREDIN H . TristouNet: Triplet loss for speaker turn embedding [C ] // 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2017 : 1 - 6 .
CHUNG J S , NAGRANI A , ZISSERMAN A . VoxCeleb2: Deep speaker recognition [EB/OL ] . ( 2018-06-27 )[ 2023-10-25 ] . https://arxiv.org/abs/1806.05622 https://arxiv.org/abs/1806.05622 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
LUO W , LI Y , URTASUN R , et al . Understanding the effective receptive field in deep convolutional neural networks [EB/OL ] . ( 2017-01-25 )[ 2023-10-25 ] . https://arxiv.org/pdf/1701.04128.pdf https://arxiv.org/pdf/1701.04128.pdf .
HU J E , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 8 ): 2011 - 2023 .
HUANG G , LIU Z A , VAN DER MAATEN L , et al . Densely connected convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2261 - 2269 .
LI C , MA X , JIANG B , et al . Deep speaker: An end-to-end neural speaker embedding system [EB/OL ] . ( 2017-05-05 )[ 2023-10-25 ] . https://arxiv.org/abs/1705.02304 https://arxiv.org/abs/1705.02304 .
0
浏览量
8
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构