(限3000 Characters): Speaker recognition and language recognition continue to attract attention and remain as hot research topics in speech processing. Conventional methods mainly focus on information from phonetic, acoustic, and prosodic these three levels. In this project, we plan to use an electromagnetic articulography device to collect a large scale real-time articulatory trajectory database from multiple speakers. Although this database is focusing on Chinese, its content do cover multiple languages and Chinese region dialects. We tend to use this database to study the variabilities caused by different speakers and languages in the articulatory space. We also plan to study the speaker independent acoustic-to-articulatory inversion technique based on multiple exemplar speakers and languages to estimate the articulatory feature for any telephone or microphone testing speech data. These estimated articulatory features could be used with acoustic features together to improve the system performance. This project also tend to apply the latest ideas of mid-level discriminative patches from the image based scene classification domain to language recognition and speaker recognition tasks in order to improve the system performance. We are going to study the mid-level discriminative tokenization framework on speech data, redefine and study the definition of mid-level units, segmentation, discriminative learning of tokens, representation and the backend classifiers on speech. This project not only can provide two new ways to perform speaker and language recognition but also bring new ideas on the speech production and paralinguistic auditory perception which is important and valuable from both the theoretical and practical point of view.
(限400字):说话人识别与语种识别一直是语音信息处理领域的研究热点。传统方法主要是利用音素层,声学层和韵律层的信息。本文旨在从言语产生系统发音信息和中层鉴别性表征这两个方向展开说话人识别与语种识别的研究。本项目拟利用电磁发音仪采集一个以汉语为主体的较大规模的函盖多个语种或方言多个说话人的言语产生系统发音轨迹数据库,并以此为基础研究不同说话人和语种在发音层上的差异并提出新特征。研究基于多个参考说话人及语种的语音到发音逆求解方法来估计普通信道下的发音特征以用来提高识别性能。本项目还拟把图像场景分析中的最新研究热点中层鉴别性图像块思想用于语种识别和说话人识别上以提高系统综合性能。我们将研究中层鉴别性表征在语音上的中层定义,分段,代表单元学习,表征,后端分类等一系列核心问题。本项目不仅为说话人和语种识别提供两种新途径,也为言语产生和副语言信息听觉认知模型带来新观点,具有重要的理论意义与实际价值。
说话人识别与语种识别一直是语音信息处理领域的研究热点。传统的方法主要利用音素层,声学层和韵律层的信息。本文旨在围绕言语产生系统发音信息和中层鉴别性表征两个方向展开说话人识别与语种识别的研究。本项目利用电磁发音仪采集了一个以汉语为主体的较大规模的函盖多个语种或方言多个说话人的言语产生系统发音轨迹数据库,并研究通过融合语音到发音逆求解特征用以提高说话人识别系统的性能。本项目还研究了中层鉴别性表征在声纹识别和语种识别中的作用,并提出了广义全空间差异因子分析算法以及在特征层融合音素层后验概率信息的算法,在NIST2010和LRE07上均取得了显著的性能提高。本项目还在传统概率线性鉴别分析(PLDA)算法的基础上,提出了多种改性方法。本研究提出的一些算法也被运用于其他副语言语音属性识别,语音转换,变声攻击检测等任务上,提高了系统的性能。本项目不仅为说话人和语种识别提供两种新途径,也为言语产生和副语言语音属性识别带来新观点,具有重要的理论意义与实际价值。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
监管的非对称性、盈余管理模式选择与证监会执法效率?
基于SSVEP 直接脑控机器人方向和速度研究
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
针灸治疗胃食管反流病的研究进展
基于听觉感知模型的说话人识别和语音语种识别新方法研究
基于JFA的耳语发音方式下说话人识别研究
噪声环境下基于多域信息融合的说话人识别鲁棒性研究
双耳双模式下言语识别效果及发音机制的研究