With the improvement of the natureness of synthesised speech generated by Text-to-Speech system, the products using speech synthesis technology have been widely applied in some engineering applications. The products of Mandarin speech synthesis have been popularized in China, and the speech synthesis technology of minorities' languages such as Tibetan, Uigur has been in the stage of developing product. However, the research of speech synthesis about national minorities' languages in Yunnan has not received due attention. Moreover, there is little research about speech synthesis of Dai language at present..In this project, a trainable text-to-speech system of Dai will be developed, and a Dai corpus for speech synthesis will be designed and constructed. According to the characteristics of Dai language, the following works will be conducted:.(1) Selection of the units for speech synthesis, design for context attributes and questions set used for decision tree clustering of HMMs, and optimization of the training process of speech synthesizers..(2) Research on the rules of prosody label and the method of part-of-speech tagging for Dai language..(3) Dictionary construction, research on the methords of sentence pre-processing, word segmentation as well as prediction of prosody phrase break, and then the development of a Dai text analyzer..(4) Research on the problem of foreign words in Dai TTS according to the phenomenon that Mandarin and English are widely used in modern Dai language..(5) Development of a real-time demonstration system of text-to-speech for Dai..The project will effectively promote speech synthesis research of minority languages in our country, and boost applications of speech synthesis technology on national minority region in the frontier.
随着语音合成自然度的提高,采用语音合成技术的产品,已得到了广泛应用。在国内,汉语普通话语音合成技术已实现产品化,藏语、维吾尔语等少数民族语的语音合成技术已处于产品化阶段,然而,云南少数民族语的语音合成研究还未得到应有的重视,傣语语音合成研究目前还无人问津。本项目以开发傣语文语转换应用系统为目的,设计并构建傣语语音合成语料库;针对傣语的特征,研究合成基元的选取方法,为HMM声学模型设计用于决策树聚类的上下文属性和问题集,优化语音合成器的训练流程;研究傣语的韵律标注规则、词性标记方法;构建傣语词典,研究傣语句子的预处理方法、分词方法,以及韵律短语预测方法,构建文本分析系统;针对现代傣语中普遍使用汉语借词、英语词汇的现象,研究傣语中外来词的文语转换问题;开发傣语文语转换实时演示系统。本项目将有力促进我国少数民族语言的语音合成研究,并推动语音技术在边疆民族地区的广泛应用。
本项目以开发傣语文语转换应用系统为目的,研究傣语文本语料库、原始语音数据库、分词、归一化、韵律短语边界预测、汉语借词发音规则、声学模型训练以及提高语音合成自然度的方法。本项目已构建完成包括1.2万余条语句的傣语文本语料库,以及包括1.7万词条的傣语电子词典。已构建完成包括2400条语句、到达播音级的傣语原始语音数据库,并已对语音数据进行了必要的音段切分和韵律标记。傣语是拼音文字,用傣语专用的文字书写,西双版纳傣语传统上称为傣仂语。在傣语专家的支持下,本项目系统归纳了傣语字音转换规则,并编程实现了傣语文本的自动罗马化。项目系统地研究了傣语文本中的数字归一化和特殊字符归一化问题。采用基于规则和上下文关键词相结合的方法识别非标准词、判断歧义类型,然后确定其正确的傣文读音。随着汉傣文化不断交融,现代傣语中吸收了大量的汉语借词,采用意译法或音译法将其转化为傣语。由于汉语借词的音译形式不属于标准傣语文字,其实际读音与拼写有明显区别。本项目在深入研究的基础上,提出了可行的汉语借词语音合成方法。分词的准确率将直接影响文语转换系统语音合成的自然度。项目采用基于词表的FMM(正向最大匹配)改进算法,分词正确率由92.3%提升为93.8%,召回率由89.2%上升为90.6%。项目还研究了基于机器学习模型的傣语分词方法,这些方法与传统机械分词方法相比,省去了词典的构建,加速了分词系统的开发流程。采用条件随机场模型,傣语文本分词准确率可达到94.36%。其效果优于朴素贝叶斯和决策树。项目开发完成了一个具有自主知识产权的傣语文语转换演示系统,填补了相关研究领域的一项空白。该演示系统的合成语音具有较高的可懂度,其自然度处于基本可接受水平。本项目为开发傣语文语转换商业应用系统奠定了良好的基础。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
纳西语语音分析与合成
藏语(拉萨话)语音声学分析与合成
基于语义分析的汉语文本错误自动侦测与纠错方法
汉语文本中观点信息的提取与倾向性分析技术研究