首页 | 本学科首页   官方微博 | 高级检索  
   检索      

稀疏平衡变分自动编码器的文本特征提取
引用本文:车蕾.稀疏平衡变分自动编码器的文本特征提取[J].国防科技大学学报,2022,44(1):169-178.
作者姓名:车蕾
作者单位:北京信息科技大学 信息管理学院, 北京 100192
基金项目:北京市教育委员会社科计划一般项目(SM201911232003);北京信息科技大学教学改革项目重点资助项目(2020JGZD03);教育部人文社科规划基金资助项目(20YJAZH129)
摘    要:针对文本特征提取方面的高维数据特征区分度较低、基于规则的特征学习的自学习性能差、变分自动编码器存在过度剪枝等问题,提出稀疏平衡变分自动编码器(Sparse Balanced Variational AutoEncoder, SBVAE)的文本特征提取模型。为消除噪声干扰,提高文本特征提取模型的鲁棒性,在文本特征提取的输入层采用双向降噪处理机制。提出一种稀疏平衡性处理,结合KL (Kullback-Leibler)项权重的模拟退火算法以缓解KL散度引发的过度剪枝的影响,强制解码器更充分地利用潜变量。此模型提高了高维数据特征的区分度。从对比分析文本特征提取模型、稀疏性能、稀疏平衡处理对隐藏空间变分下界的影响等方面深入开展实验,验证了该模型具有较好的性能。该模型在复旦数据集和Reuters数据集上的最高准确率相较于主成分分析分别提升了12.36%、8.06%。

关 键 词:变分自动编码器  降噪  稀疏平衡  过度剪枝
收稿时间:2020/7/7 0:00:00

Text feature extraction based on sparse balanced variational autoencoder
CHE Lei.Text feature extraction based on sparse balanced variational autoencoder[J].Journal of National University of Defense Technology,2022,44(1):169-178.
Authors:CHE Lei
Institution:School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China
Abstract:In order to solve the problems of low feature differentiation of high-dimensional data in text feature extraction, poor self-learning performance of rule-based representation learning, and excessive pruning of variational autoencoder, a text feature extraction model based on SBVAE (sparse balanced variational autoencoder) was proposed. In order to eliminate noise interference and improve robustness of the text feature extraction model, a bidirectional noise reduction mechanism was designed for variational autoencoder in the input layer of the text feature extraction. A sparse balance method combined with simulated annealing algorithm of weights of KL (Kullback-Leibler) terms was proposed to alleviate the effect of excessive pruning caused by KL divergence, and forced decoders to make full use of the latent variables. The model improves the discrimination of high-dimensional data features. Experiments were carried out in several aspects, including comparative analysis of text feature extraction model, sparse performance and influence of sparse balance on the lower bound of variation in hidden space. The results show that the proposed model has good performance. The highest accuracy of the proposed model of Fudan and Reuters datasets is increased by 12.36% and 8.06% in comparison with that of PCA, respectively.
Keywords:variational autoencoder  noise reduction  sparse balance  excessive pruning
本文献已被 万方数据 等数据库收录!
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号