首页 | 本学科首页   官方微博 | 高级检索  
     

基于句子级BLEU指标挑选数据的半监督神经机器翻译
引用本文:叶绍林,郭武. 基于句子级BLEU指标挑选数据的半监督神经机器翻译[J]. 兵团教育学院学报, 2017, 30(5). DOI: 10.16451/j.cnki.issn1003-6059.201710008
作者姓名:叶绍林  郭武
作者单位:中国科学技术大学 语音及语言信息处理国家工程实验室 合肥230027
基金项目:国家重点研发计划专项项目(No. 2016YFB1001303)资助Supported by National Key Research and Development Program of China
摘    要:在单语语料的使用上,统计机器翻译可通过利用语言模型提高性能,而神经机器翻译很难通过这种方法有效利用单语语料.针对此问题,文中提出基于句子级双语评估替补(BLEU)指标挑选数据的半监督神经网络翻译模型.分别利用统计机器翻译和神经机器翻译模型对无标注数据生成候选翻译,然后通过句子级BLEU指标挑选单语候选翻译,加入到有标注的数据集中进行半监督联合训练.实验表明,文中方法能高效利用无标注的单语语料,在NIST汉英翻译任务上,相比仅使用精标的有标注数据单系统,文中方法BLEU值有所提升.

关 键 词:半监督  句子级双语评估替补(  BLEU)  神经机器翻译

Semi-supervised Neural Machine Translation Based on Sentence-Level BLEU Metric Data Selection
YE Shaolin,GUO Wu. Semi-supervised Neural Machine Translation Based on Sentence-Level BLEU Metric Data Selection[J]. Journal of Bingtuan Education Institute, 2017, 30(5). DOI: 10.16451/j.cnki.issn1003-6059.201710008
Authors:YE Shaolin  GUO Wu
Abstract:The performance of statistical machine translation is improved by language model. However, the monolingual corpus is not equal to be effectively used by neural machine translation. To solve this problem , a semi-supervised neural machine translation model based on sentence-level bilingual evaluation understudy( BLEU) metric data selection is proposed. The candidate translations for non-labeled data are firstly generated by statistical machine translation and neural machine translation models, respectively. Then the candidate translations are selected through sentence-level BLEU, and the selected candidate translations are added to the labeled dataset to conduct semi-supervised joint training. The experimental results demonstrate the effectiveness of the proposed algorithm in the usage of non-labeled data. In the NIST Chinese-English translation tasks, the proposed method obtains an obvious improvement over the baseline system only with the fine labeled data.
Keywords:Semi-supervise  Sentence-Level Bilingual Evaluation Understudy ( BLEU )  Neural Machine Translation
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号