

浏览全部资源
扫码关注微信
1.四川大学电子信息学院,四川成都 610065
2.四川省中医药科学院,四川成都 610041
Accepted:2023-10-17,
Published:15 August 2023
移动端阅览
周杭, 方清茂, 张美, 等. 基于特征融合的文本到图像生成方法[J]. 新一代信息技术, 2023, 6(15): 06-12
ZHOU Hang, FANG Qing-mao, ZHANG Mei, et al. Text to Image Generation Method Based on Feature Fusion[J]. New Generation of Information Technology, 2023, 6(15): 06-12
周杭, 方清茂, 张美, 等. 基于特征融合的文本到图像生成方法[J]. 新一代信息技术, 2023, 6(15): 06-12 DOI: 10.3969/j.issn.2096-6091.2023.15.002.
ZHOU Hang, FANG Qing-mao, ZHANG Mei, et al. Text to Image Generation Method Based on Feature Fusion[J]. New Generation of Information Technology, 2023, 6(15): 06-12 DOI: 10.3969/j.issn.2096-6091.2023.15.002.
近年来,随着生成对抗网络(Generative Adversarial Network
GAN)技术的不断发展,其被广泛应用于文本生成图像的任务中。现单阶段生成对抗模型大多只使用了句子文本描述,没有充分利用文本信息。为此,本文以单阶段生成对抗模型为基础,提出了一种基于特征融合的文本到图像生成方法(FFGAN)。一方面,构建文本-图像跨模态融合模块使单词向量特征和图像特征能够有效融合,丰富生成图像的细节;另一方面,引入感知损失来缩小生成图像和目标图像的距离,使得图像逼真度更高。实验结果表明,在CUB数据集上,FFGAN模型的IS分数达到了5.22±0.08,FID分数达到了13.91。在COCO数据集上,FFGAN模型的FID分数达到了16.97。大量实验充分证明了FFGAN的优越性以及有效性。
In recent years
with the continuous development of generative adversarial network (GAN) technology
it has been widely used in the task of generating images from text. Most existing single-stage GAN solely rely on textual sentences
failing to fully leverage the available textual information. To address these limitations
this study proposes a feature fusion-based text-to-image generation method (FFGAN)
based on a single-stage GAN. FFGAN incorporates a text-image cross-modal fusion module
enabling effective fusion of word vector features and image features. This fusion enriches the generated image's details. Additionally
perceptual loss is introduced to minimize the discrepancy between the generated and target images
enhancing the realism of the generated image. Experimental results on the CUB dataset demonstrate that the FFGAN model achieves an IS score of 5.22±0.08 and an FID score of 13.91. On the COCO dataset
the FID score of the FFGAN model reaches 16.97. Through numerous experiments
FFGAN’s superiority and effectiveness have been conclusively demonstrated.
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial networks [J ] . Communications of the ACM , 2020 , 63 ( 11 ): 139 - 144 .
PATHAK D , KRÄHENBÜHL P , DONAHUE J , et al . Context encoders: Feature learning by inpainting [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2536 - 2544 .
ANDREINI P , BONECHI S , BIANCHINI M , et al . Image generation by GAN and style transfer for agar plate image segmentation [J ] . Computer Methods and Programs in Biomedicine , 2020 , 184 : 105268 .
REED S , AKATA Z , YAN X C , et al . Generative adversarial text to image synthesis [C ] // Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 . New York : ACM , 2016 : 1060 - 1069 .
ZHANG H , XU T , LI H S , et al . StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 5908 - 5916 .
XU T , ZHANG P C , HUANG Q Y , et al . AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 1316 - 1324 .
CHEN Z L , WANG C , WU H M , et al . DMGAN: Discriminative metric-based generative adversarial networks [J ] . Knowledge-Based Systems , 2020 , 192 : 105370 .
TAO M , TANG H , WU F , et al . DF-GAN: A simple and effective baseline for text-to-image synthesis [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 16494 - 16504 .
LIAO W T , HU K , YANG M Y , et al . Text to image generation with semantic-spatial aware GAN [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 18166 - 18175 .
SAHARIA C , CHAN W , SAXENA S , et al . Photorealistic text-to-image diffusion models with deep language understanding [J ] . Advances in Neural Information Processing Systems , 2022 , 35 : 36479 - 36494 .
RAMESH A , DHARIWAL P , NICHOL A , et al . Hierarchical text-conditional image generation with CLIP latents [EB/OL ] . [ 2023-8-15 ] . https://3dvar.com/Ramesh2022Hierarchical.pdf https://3dvar.com/Ramesh2022Hierarchical.pdf .
赖丽娜 , 米瑜 , 周龙龙 , 等 . 生成对抗网络与文本图像生成方法综述 [J/OL ] . 计算机工程与应用 , 2023 : 1 - 23 . ( 2023-03-15 )[ 2023-09-23 ] . https://kns.cnki.net/kcms/detail/11.2127.TP.20230314.1549.022.html https://kns.cnki.net/kcms/detail/11.2127.TP.20230314.1549.022.html .
WANG L , WANG L , CHEN S . ESA‐CycleGAN: Edge feature and self‐attention based cycle‐consistent generative adversarial network for style transfer [J ] . IET Image Processing , 2022 , 16 ( 1 ): 176 - 190 .
LUO X , HE X , QING L , et al . EyesGAN: Synthesize human face from human eyes [J ] . Neurocomputing , 2020 , 404 : 213 - 226 .
SONG J , YI H W , XU W Q , et al . Dual perceptual loss for single image super-resolution using ESRGAN [EB/OL ] . [ 2023-8-15 ] . https://arxiv.org/pdf/2201.06383.pdf https://arxiv.org/pdf/2201.06383.pdf .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
WAH C , BRANSON S , WELINDER P , et al . The Caltech-UCSD Birds-200-2011 Dataset [R ] . Pasadena, USA : California Institute of Technology, Computation & Neural Systems Technical Report , 2011 : CNSTR-2010-001.
LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [C ] // Computer Vision — ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 .
SALIMANS T , GOODFELLOW I , ZAREMBA W , et al . Improved techniques for training GANs [C ] // Proceedings of the 30th International Conference on Neural Information Processing Systems . New York : ACM , 2016 : 2234 - 2242 .
HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6629 - 6640 .
RUAN S L , ZHANG Y , ZHANG K , et al . DAE-GAN: Dynamic aspect-aware GAN for text-to-image synthesis [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 13940 - 13949 .
ZHANG Z X , SCHOMAKER L . DTGAN: dual attention generative adversarial networks for text-to-image generation [C ] // 2021 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2021 : 1 - 8 .
0
Views
268
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621