Monocular depth estimation and visual tracking are two fundamental and highly correlated topics in computer vision. Nevertheless, most existing depth estimation methods fail to capture semantic information from input images, leading to sub-optimal performance. Meanwhile, the application of depth estimation to visual tracking is largely unexplored..In light of the above issues, this project proposes to study monocular depth estimation from the perspective of semantic understanding, as well as its application to visual tracking. First, a cross-task multi-level semantic encoding model will be presented to justify the correlation between depth estimation and semantic understanding. Semantic encoding will be performed from both global and local views, and then applied to depth estimation through a cross-task attention mechanism. Second, a cross-domain semantic embedding model will be designed, which aims to improve the generalization ability of depth estimation via domain adaptation between real and synthetic training data. The collaboration between semantic supervision and adversarial training, and its impact to network training stability will also be investigated. To further improve the accuracy of depth estimation, the incorporation of spatial information as well as structural context modeling are also within the scope of this project. Finally, this project will propose visual tracking algorithms based on monocular depth estimation. To this end, the integration between depth and color features will be studied. The target appearance models with multi-modality and adaptive switch mechanism will be destined. In order to further improve tracking accuracy, the joint optimization of depth estimation and visual tracking will be investigated. .The key findings and algorithm designs of the above proposal are expected to provide theoretical basis and practical reference for monocular depth estimation as well as its application to visual tracking.
单目图像深度估计与单目标视觉跟踪是计算机视觉中重要且紧密关联的问题。然而,现有的深度估计算法缺乏语义捕获能力,导致鲁棒性较差,且其在跟踪中的应用被长期忽视。.本项目拟围绕语义建模,研究深度估计,并探索其在跟踪中的应用。首先,拟建立跨任务、多层次的语义表征模型,探究深度和语义信息的关联;拟设计全局和局部语义编码模块,通过跨任务的注意力机制,将语义信息作用于深度估计模型中。其次,拟提出跨领域语义嵌入模型;通过消除真实与合成训练数据的领域差异,提升算法泛化性能;拟探索语义监督信息与对抗训练的联合应用模式,及二维坐标信息和上下文结构化建模在深度估计中的作用。最后,拟提出基于深度估计的跟踪算法;拟研究深度与颜色信息的融合机制,并设计多模态目标外观模型及自适应切换机制;拟探究深度估计和跟踪算法的联合优化对跟踪性能的影响。.拟通过上述研究,为精确的单目深度估计及其在跟踪中的有效应用提供理论与实践支撑。
单目深度估计旨在从单幅二维图像中恢复出其丢失的深度数据,为传统的二维视觉感知任务提供了重要的场景几何结构信息,在机器人、无人驾驶、三维重构等领域具有重要应用。本项目立足于智能视觉感知问题,从网络结构、损失函数设计、模型优化等多个维度对单目深度估计及其在视觉感知中的应用进行了研究。在单目深度估计方面,提出了基于语义分而治之的分区域深度预测机制,以及深度估计与全景分割的联合学习方法,建立基于跨任务、多层级语义建模的单目深度估计模型,研究了单目深度估计与相机姿态联合自监督学习任务,提出了具有尺度一致性的自监督深度估计框架,以及自适应联合自监督学习理论。在视觉目标感知方面,提出了基于多源不确定性挖掘思想的显著目标检测方法,建立了面向视觉跟踪的标签自动生成框架,探索了RGB-D多模态融合机制,并设计了基于多视角RGB-D显著目标检测架构。上述研究成果为智能视觉感知尤其是单目深度估计问题的研究提供了重要理论基础和实践指导。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
五轴联动机床几何误差一次装卡测量方法
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
深度相机下基于全局-局部协作模型的视觉目标跟踪研究
基于验证建模的深度视觉跟踪方法研究
单目多视角深度图估计的三维目标检测与语义重建研究
视觉目标跟踪中的深度学习表观建模方法