首页 | 本学科首页   官方微博 | 高级检索  
   检索      

基于自训练EM算法的半监督文本分类
引用本文:张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类[J].国防科技大学学报,2007,29(6):65-69.
作者姓名:张博锋  白冰  苏金树
作者单位:国防科技大学,计算机学院,湖南,长沙,410073
基金项目:国家自然科学基金重大研究计划资助项目(90604006),教育部高校博士点基金资助项目(20049998027)
摘    要:为了提高计算效率,提出基于自训练的改进EM算法STEM。在每步迭代的E-step中,将中间分类器最有把握对其类别进行预测的未标注样本转移至标注样本集,并应用到M-step中进行下一个中间分类器的训练,从而引入了利用中间结果的自训练机制。文本分类实验表明STEM算法在大部分情况下的分类准确性都高于EM,并通过减少迭代提高了分类器学习的计算效率。

关 键 词:半监督学习  EM算法  自训练  文本分类  naveBayes
文章编号:1001-2486(2007)06-0065-05
收稿时间:2007/4/18 0:00:00
修稿时间:2007年4月18日

Semi-supervised Text Classification Based on Self-training EM Algorithm
ZHANG Bofeng,BAI Bing and SU Jinshu.Semi-supervised Text Classification Based on Self-training EM Algorithm[J].Journal of National University of Defense Technology,2007,29(6):65-69.
Authors:ZHANG Bofeng  BAI Bing and SU Jinshu
Institution:College of Computer, National Univ. of Defense Technology, Changsha 410073, China;College of Computer, National Univ. of Defense Technology, Changsha 410073, China;College of Computer, National Univ. of Defense Technology, Changsha 410073, China
Abstract:To improve computation efficiency,an enhanced EM algorithm based on self-training named STEM is proposed.In the E-step of each iteration,the unlabeled sample,whose class can be predicted by the current intermediate classifier with the most confidence,is moved to the labeled set and used in the M-step to train the next intermediate classifier.Therefore the mechanism of self-training by inter-result employing is introduced.Experimentation on text classification indicates that STEM outperforms EM in classification accuracy most of the time and improves the learning efficiency by reducing iterations.
Keywords:semi-supervised learning  EM algorithm  self-training  text classification  naive Bayes
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号