基于自训练EM算法的半监督文本分类 Semi-supervised Text Classification Based on Self-training EM Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于自训练EM算法的半监督文本分类

引用本文：	张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类[J].国防科技大学学报,2007,29(6):65-69.

作者姓名：	张博锋白冰苏金树

作者单位：	国防科技大学,计算机学院,湖南,长沙,410073

基金项目：	国家自然科学基金重大研究计划资助项目(90604006)，教育部高校博士点基金资助项目(20049998027)

摘要：	为了提高计算效率,提出基于自训练的改进EM算法STEM。在每步迭代的E-step中,将中间分类器最有把握对其类别进行预测的未标注样本转移至标注样本集,并应用到M-step中进行下一个中间分类器的训练,从而引入了利用中间结果的自训练机制。文本分类实验表明STEM算法在大部分情况下的分类准确性都高于EM,并通过减少迭代提高了分类器学习的计算效率。
关键词：	半监督学习 EM算法自训练文本分类 naveBayes
文章编号：	1001-2486(2007)06-0065-05
收稿时间：	2007/4/18 0:00:00
修稿时间：	2007年4月18日
Semi-supervised Text Classification Based on Self-training EM Algorithm

ZHANG Bofeng,BAI Bing and SU Jinshu.Semi-supervised Text Classification Based on Self-training EM Algorithm[J].Journal of National University of Defense Technology,2007,29(6):65-69.

Authors:	ZHANG Bofeng BAI Bing and SU Jinshu

Institution:	College of Computer, National Univ. of Defense Technology, Changsha 410073, China;College of Computer, National Univ. of Defense Technology, Changsha 410073, China;College of Computer, National Univ. of Defense Technology, Changsha 410073, China

Abstract:	To improve computation efficiency,an enhanced EM algorithm based on self-training named STEM is proposed.In the E-step of each iteration,the unlabeled sample,whose class can be predicted by the current intermediate classifier with the most confidence,is moved to the labeled set and used in the M-step to train the next intermediate classifier.Therefore the mechanism of self-training by inter-result employing is introduced.Experimentation on text classification indicates that STEM outperforms EM in classification accuracy most of the time and improves the learning efficiency by reducing iterations.

Keywords:	semi-supervised learning EM algorithm self-training text classification naive Bayes
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《国防科技大学学报》浏览原始摘要信息
	点击此处可从《国防科技大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏