Investigation to the computational model of visual saliency is not only boosting the improvement of its related applications, but also helpful to the deepen our understanding to the human visual system. Considering the rapid development in the research on the inverse problem in recent years, this proposal suggests to reconsider the visual saliency analysis under the framework of inverse problem, which will improve the performance by developing new computational models and dig deep into the mechanism of the visual system. Under the framework of inverse problem, reconstruction errors are closely related to visual saliency. There are two efficient tools for solving the inverse problem, which are dictionary learning originated from compressed sensing and stacked auto-encoder developed by the deep learning community. In this project, we will create new approaches for visual saliency estimation based on these generative models for binocular and motion images. To achieve such goals, feature extraction methods, structures of generative models, and methods for learning parameters of the models will be investigated systematically. Our special focus will be on the following aspects. 1) The general framework will be developed for saliency estimation under the perspective of the inverse problem. 2) The computational model of visual saliency for binocular and moving images will be designed with a structure in accordance with the functional topology of the visual cortex. 3) In depth explanation of the proposed framework will be provided. The scientific problems of this project are summarized based on the thorough investigation of literature in visual cognition and deep learning, which should be conducted in the current new conditions. The method to be developed is invaluable in a wide range of applications including visual objects detection, image quality assessment, and video coding.
深入研究视觉显著性计算模型不仅能够促进相关应用性能的提升,亦有助于深化我们对人类视觉系统的理解。考虑到近年来逆问题研究的飞速发展,本项目从逆问题的角度重新审视视觉显著性分析问题,期望能够改善其计算模型的性能,深化对视觉系统工作机理的理解。在逆问题框架下,重构误差能够反映显著性。字典学习与栈式自动编码器是求解欠定的逆问题的有效工具,本项目拟研究基于这两种生成式模型估计视觉显著性的方法。为此将系统深入的研究双目与运动图像的特征提取方法、生成式模型的结构、参数学习方法等问题。其中重点研究:1)逆问题视角下视觉显著性分析的一般性框架;2)针对双目及运动图像,设计与视觉皮层功能结构一致的计算模型架构;3)模型合理性的理论阐释。本项目在大量调研视觉认知与深度学习等文献资料的基础上,总结出科学问题,基于新的条件进行研究,具有重要的理论意义,在目标检测、质量评价、视频编码等领域有广泛的应用价值。
本项目从逆问题求解的视角出发,以神经网络参数的经验学习作为求解逆问题的手段,以避免直接求解会遇到的病态问题。我们研究并构建了能够表征输入视觉刺激自身的深层编码-解码架构,建立了基于重构表征学习的模型用于视觉显著性估计。具体主要包括,提出了基于深层栈式自编码器的静态图像显著性估计方法;在此基础上,引入基于背景模型的采样方案改善了自编码器模型在双目/立体显著性的估计性能;进一步的,引入了迭代表示学习框架以解决人眼观察场景过程中动态扫视路径预测的问题。随着深度学习的发展,提出了表达能力更强的、基于现代卷积编码-解码结构的方法用于处理视觉输入中多实例竞争、分辨率损失等问题。同时本项目还研究了深度图像获取等一系列相关问题。本项目以视觉显著性估计研究为核心问题,围绕基于重构表征学习思路形成了一系列深度网络模型,较好地完成了预期研究任务,达到了预期目标。.
{{i.achievement_title}}
数据更新时间:2023-05-31
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
F_q上一类周期为2p~2的四元广义分圆序列的线性复杂度
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
基于视觉显著性的图像内容分析与检索
网络环境下基于视觉显著性的图像检索
基于视觉显著性和稀疏表示的图像质量评价
基于深度学习和双目视觉的深度图像估计及水下图像复原