汉语多音词歧义消解研究_计算机科学与技术.rar[原创毕业论文]

需要金币：1000 个金币	资料包括：完整论文
转换比率：金额 X 10=金币数量，例100元=1000金币	论文字数：14869
折扣与优惠：团购最低可5折优惠 - 了解详情	论文格式：Word格式(*.doc)

上一篇：二维矢量图形绘制系统的设计与开发.rar

下一篇：汉语文本分类研究_计算机科学与技术.rar

摘要:汉语多音词消歧是自然语言处理领域的基础问题之一。多音词是汉语中普遍存在的现象，是自然语言处理不可回避的问题，因此多音词若得不到很好的解决，将成为制约自动注音的瓶颈。近几年来，虽然出现了一些自动注音软件，但是多音词消歧的问题仍没有得到很好的解决。所以，本文对汉语多音词的歧义消解进行研究。

　　本文的主要工作如下：

　　多音词抽取。从电子版的《现代汉语词典》中统计出收录的所有多音词。

　　语料准备。从2001年《人民日报》语料中抽取含多音词的句子，根据音项对语料进行标注。

　　多音词消歧。利用多音词的语境信息进行歧义消解并在语料上进行了实验。本文使用了五种模型，即CRF、最大熵、RFR_SUM、SVM和语义相似度，对22个多音词进行了歧义消解，其平均正确率分别为85.27%、91.63%、94.04%、89.96%和89.16%。还使用了投票集成的方法，其平均正确率为96.34%。最后使用基于种子词的方法对多音词进行消歧。

　　实现了一个自动注音系统。其可对62个多音词进行消歧。

关键词：多音词消歧，自动注音，CRF，最大熵，RFR_SUM，SVM，语义相似度，种子词

Abstract:Chinese polyphone disambiguation is one primary problem in the field of Natural Language Processing.Polyphone is prevalent phenomenon in Chinese.Also polyphone can not be avoided in NLP.So,if the problem is not well resolved,it will become a bottleneck of phonetic automatic.In recent years,although there have been some softwares which phonetic automatically,it still does not have a very good solution to polyphone disambiguation.So,this paper studies on Chinese polyphone disambiguation.

　　The details are as follows:

　　Polyphone extracting.Extract all polyphones from the electronic version of "Modern Chinese Dictionary".

Corpus prepareing.Extract sentences which contain polyphones from the "People's Daily" corpus of 2001 and categorize based on pronunciations.

　　Polyphone disambiguation.This paper uses the context information of polyphone in disambiguation and tests in corpus.It uses five models,namely CRF,Maximum Entropy,RFR_SUM,SVM and similarity of word-sense,to disambiguate the pronunciation of 22 polyphone,the average accurate rates are: 85.27%、91.63%、94.04%、89.96% and 89.16%.Moreover,it uses integrated of voting which reaches 96.34%.Finally，this paper disambiguates polyphone based on seed word.

　　Build a system of phonetic automatic which can disambiguate on 62 polyphones.

Key Words：polyphone disambiguation，phonetic automatic，CRF，Maximum Entropy，RFR_SUM，SVM，similarity of word-sense，seed word

　　自然语言处理的一项重要任务就是对语言中存在的大量歧义现象进行消解。通过对大量语料的研究发现，现代汉语中存在大量的多音词。所以，本文将对现代汉语中多音词进行相关的研究。在此基础上，使用机器学习的方法对多音词的读音消歧进行研究。最后，研究开发自动注音系统。

　　本文主要工作如下：

　　1、对《现代汉语词典》中多音词进行统计分析；

　　2、从《人民日报》语料抽取含多音词的句子并进行标注；

　　3、使用语境信息对多音词的读音进行消歧；

　　4、建立自动注音系统。

XX学院田径运动会信息管理系统ASP.NET+SQ	“好客来”餐饮订餐系统的设计与实现	网上商城购物系统ASP+SQL.doc
基于Proteus单片机知识点学习系统的设计与	婚纱影楼管理系统的设计与实现.zip	多目标优化在工程项目质量工期成本管理
中小服装企业电子商务网站构建与开发	企业人事管理系统.doc	基于WEB的学生宿舍管理系统.zip
《操作系统》精品课程网站功能设计与实	基于HDFS的中间件开发.rar	基于JSP酒店菜单定制网上平台的设计与实