首页 | 本学科首页   官方微博 | 高级检索  
     

稀疏平衡变分自动编码器的文本特征提取
引用本文:车蕾. 稀疏平衡变分自动编码器的文本特征提取[J]. 国防科技大学学报, 2022, 44(1): 169-178. DOI: 10.11887/j.cn.202201023
作者姓名:车蕾
作者单位:北京信息科技大学 信息管理学院, 北京 100192
基金项目:北京市教育委员会社科计划一般项目(SM201911232003);北京信息科技大学教学改革项目重点资助项目(2020JGZD03);教育部人文社科规划基金资助项目(20YJAZH129)
摘    要:针对文本特征提取方面的高维数据特征区分度较低、基于规则的特征学习的自学习性能差、变分自动编码器存在过度剪枝等问题,提出稀疏平衡变分自动编码器(Sparse Balanced Variational AutoEncoder,SBVAE)的文本特征提取模型.为消除噪声干扰,提高文本特征提取模型的鲁棒性,在文本特征提取的输入...

关 键 词:变分自动编码器  降噪  稀疏平衡  过度剪枝
收稿时间:2020-07-07

Text feature extraction based on sparse balanced variational autoencoder
CHE Lei. Text feature extraction based on sparse balanced variational autoencoder[J]. Journal of National University of Defense Technology, 2022, 44(1): 169-178. DOI: 10.11887/j.cn.202201023
Authors:CHE Lei
Affiliation:School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China
Abstract:In order to solve the problems of low feature differentiation of high-dimensional data in text feature extraction, poor self-learning performance of rule-based representation learning, and excessive pruning of variational autoencoder, a text feature extraction model based on SBVAE (sparse balanced variational autoencoder) was proposed. In order to eliminate noise interference and improve robustness of the text feature extraction model, a bidirectional noise reduction mechanism was designed for variational autoencoder in the input layer of the text feature extraction. A sparse balance method combined with simulated annealing algorithm of weights of KL (Kullback-Leibler) terms was proposed to alleviate the effect of excessive pruning caused by KL divergence, and forced decoders to make full use of the latent variables. The model improves the discrimination of high-dimensional data features. Experiments were carried out in several aspects, including comparative analysis of text feature extraction model, sparse performance and influence of sparse balance on the lower bound of variation in hidden space. The results show that the proposed model has good performance. The highest accuracy of the proposed model of Fudan and Reuters datasets is increased by 12.36% and 8.06% in comparison with that of PCA, respectively.
Keywords:variational autoencoder   noise reduction   sparse balance   excessive pruning
本文献已被 万方数据 等数据库收录!
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号