With the rapid development of information technology, the scale of data is larger and larger while the sources of data are also becoming more and more diversified, leading to the information items with similar underlying contents arriving in different forms, modalities or views, and so on. However, due to the uncertainty and human factors in the real situation, there exist inevitable the deficiencies, noises, and redundancies in multi-source heterogeneous big data. These problems not only seriously interfere with the applications of big data analysis but also bring about incorrect analytic results and waste vast computing time and precious storage space. Nevertheless, these multi-source data show the characteristics of the heterogeneity of low-level features and the correlation of high-level semantics. Traditional mono-source learning methods have not been able to adapt to the feature heterogeneity of multi-source data. Therefore, based on the semantic complementarity, information correlation, and distributional similarity among different sources, this project will focus on the researches of shared subspace learning on: 1) Missing source completion of multi-source heterogeneous data; 2) Corrupted source denoising of multi-source heterogeneous data; 3) Redundant source reduction of multi-source heterogeneous data. These research results can be widely applied in multimedia analysis, information retrieval, medical diagnosis and other fields. Thus, the research on the above-mentioned problems must have very important theoretical significance and application value. More importantly, our project has a significant impact on the development of multi-source heterogeneous big data learning in our country. Meanwhile, the research achievements will also enrich the present machine learning theory and method.
随着信息技术的快速发展,数据的规模越来越大,而数据的来源也越发多样化,从而致使数据采用不同形式、不同模态和不同视角等多种方式描述相同的事物。但在现实情况下,由于人为因素等原因,在海量的多源异构数据中必然存在着缺失、噪声和冗余。这些问题不仅严重干扰大数据分析的应用,而且会导致错误的分析结果,浪费大量的计算时间和宝贵的存储空间。然而,多源数据呈现出底层特征异构、高层语义相关的特性。传统的单源学习方法已无法适应多源数据的特性。为此,基于异源间的语义互补性、信息相关性和分布相似性,利用共享子空间学习方法,本项目将开展:1)多源异构数据的缺失源补全;2)多源异构数据的腐蚀源去噪;3)多源异构数据的冗余源缩减等方面的研究。该研究成果可广泛应用于多媒体分析、信息检索和医疗诊断等领域,具有重要的理论意义和实用价值。而这些研究也将对我国多源异构大数据学习的发展产生重要的影响,并进一步丰富现有的机器学习理论。
随着信息技术的快速发展,数据的规模越来越大,而数据的来源也越发多样化,从而致使数据采用不同形式、不同模态和不同视角等多种方式描述相同的事物。但在现实情况下,由于人为因素等原因,在海量的多源异构数据中必然存在着缺失、噪声和冗余。这些问题不仅严重干扰大数据分析的应用,而且会导致误的分析结果,浪费大量的计算时间和宝贵的存储空间。然而,多源数据呈现出底层特征异构、高层语义相关的特性。传统的单源学习方法已无法适应多源数据的特性。为此,基于异源间的语义互补性、信息相关性和分布相似性,利用共享子空间学习方法,本项目开展了:1)多源异构数据的缺失源补全;2)多源异构数据的腐蚀源去噪;3)多源异构数据的冗余源缩减等方面的研究。该研究成果可广泛应用于多媒体分析、信息检索和医疗诊断等领域,具有重要的理论意义和实用价值。而这些研究也将对我国多源异构大数据学习的发展产生重要的影响,并进一步丰富现有的机器学习理论。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
大数据多视图子空间非监督机器学习理论与方法
面向认知的多源数据学习理论与算法
标签共享子空间多源迁移学习方法及在雷达辐射源识别中的研究
基于网格的多源异构数据访问与集成方法研究