Speech is the most natural way for communication. Thus speech processing has been developing into an indispensable part of this modern world for numerous applications, e.g. hearing aids and cellular ratio telephony system. Owing to the imperfections in speech acquisition, transmission and storage, speech signal is often degraded severely. Speech enhancement algorithms can therefore be used to improve the quality of speech as a preprocessor in speech coding or recognition systems, etc. In the past decades, lots of speech enhancement algorithms have been proposed. Most of them focus on the noise reduction aspect while pay less attention on the correlation with human perception. In this project, speech enhancement algorithm based on perceptual information will be proposed. Several factors including impacts of noise reduction and speech distortion as well as auditory masking property will be considered for developing especially in the transform domain based speech enhancement algorithms. The compensation of voiced/unvoiced sound will also be applied to reduce the speech distortion resulting from noise reduction process. The output of the proposed algorithm is expected to well suit human perception. Generally, enhanced speech should be evaluated either by the subjective listening test or objective measure. Objective measure is favorable since it is convenient and time saving. However, a valid objective measure needs to correlate well with subjective listening tests, and for that reason much research has been focused on developing objective measures that model various aspects of the auditory system. All the traditional objective measures tend to give one score for one resulting speech to show how similar it is to the original speech, which are not able to give specific judgements on speech distortion or noise reduction although speech distortion and noise reduction are two key metrics to evaluate the resulting speech quality. Novel objective measure based on perceptual information will be proposed in this project to give a multi-dimensional description for the quality of the resulting speech, especially in terms of the speech distortion and noise reduction aspects. With the proposed objective measure, the most suitable speech enhancement algorithm can be selected to fit into the purpose for different applications. Moreover, speech enhancement algorithms can be improved accordingly based on their scores of different evaluation dimensions which will benefit all the speech enhancement algorithms.
语言是人类最自然的交流方式,具有不可替代的地位。由于在获取、传输、存储当中存在的不完善性,语音信号存在严重的失真。而语音增强算法,就是用来去除背景噪声,改善语音质量的。我们在过去的研究工作中,已经提出了几种变换域中的语音增强算法包括维纳滤波等,很好地实现了上述目标。但是,增强后的语音是否很符合人类听觉认知和需求,仍然未知。因此,基于新型感知信息的语音增强算法将被进一步研究,并加强清音与浊音以降低去噪过程中造成的语音干扰,最终实现更符合人耳听觉感知的新型算法。另外,在语音质量评估方面,客观质量评估方法因其便捷性和高效性而被广泛采用。然而,现有的绝大多数客观质量评估方法,只能给受测试的增强后的语音打一个分数,以判断优劣。但这个判断相当模糊,功能性不明。本项目因此提出了基于感知信息的客观质量评估方法,以提供多角度的更多更明确的判断,使之更可靠、更贴近人类主观判断。
语言是人类最自然的交流方式,具有不可替代的地位。由于在获取、传输、存储当中存在的不完善性,语音信号存在严重的失真。而语音增强算法,就是用来去除背景噪声,改善语音质量的。现有的大部分语音增强算法致力于增加信噪比,但是增强后的语音是否符合人类听觉认知和需求,仍然未知。因此,本项目致力于研究基于感知信息的语音增强及客观质量评估方法。在过去三年的研究实践中,我们完成了基于感知信息的语音增强和客观质量评估方法的研发。在现有的语音增强算法当中,主要是选取了人耳掩蔽效应这一感知信息,该模型在不同变换域之间的转换和连接,是能否融合这一听觉感知信息的科学问题核心。我们将一种二维的心理声学模型引入到本项目的研究中,将人耳掩蔽效应很好的融合于梅尔谱系数中,实现了为自动语音识别系统服务的语音增强,并取得较高的语音识别率,成功地实现了基于感知信息语音增强的研发目的。另一方面,一种新颖的在时域和频域均可实施的三维语音质量客观评估方法被提出,它除了包含现有的总体主观印象这一维度信息之外,还拓展出了基于背景噪声和语音干扰两个维度的信息。通过分析纯净语音与去噪语音之间的差值,即残留信号,探索它与纯净语音之间的相关性得到对语音干扰的判断,同时通过计算纯净语音与残留噪声的信噪比得到对背景噪声的判断,最后通过对两者的线性拟合得到对总体质量的判断。取得的客观质量评估结果具有更可靠、更贴近主观听力测试结果的特性,解决了现有客观质量方法评估结果单一、指示不明的问题,实现了基于感知信息开发客观质量评估方法的目的,该方法还在粤语及普通话材料上进行了测试。综上可知,在为期三年的研究中,通过解决关键的科学问题基本实现了全部预期研究成果,从心理声学模型及人类听觉感知等角度指导语音增强算法和客观质量评估方法的开发,建立基于背景噪声和语音干扰的主观印象的新型感知信息机制,丰富听觉感知信息体系,促进其与工程应用,特别是与语音处理技术的有机结合,为其进一步的发展提供新颖的见解和可靠的理论依据。已在国内外权威杂志和一流学术会议上发表9篇高水平学术论文,其中7篇为期刊论文,2篇为国际会议论文。在技术开发及人才培养方面也取得良好的结果。今后将会继续开展本项目的研究工作,不断推进其同相关领域产业的有机结合,由于本项目是以人为本的基础研究项目,因此同多媒体通讯、医疗康复等领域有较好的应用前景,值得不断的投入和推动。
{{i.achievement_title}}
数据更新时间:2023-05-31
气体介质对气动声源发声特性的影响
多孔夹芯层组合方式对夹层板隔声特性影响研究
融合字符串特征的维吾尔语形态切分
新型非易失存储环境下事务型数据管理技术研究
A Fast Algorithm for Computing Dominance Classes
基于客观质量评估和音频场景分析语音分离新方法研究
基于环境感知的多通道语音增强优化建模方法及应用
基于人类多元感知特性的全景图像质量客观评价理论与方法
与主观感知一致的立体图像和视频视觉质量客观评价与增强研究