Semi-supervised learning (SSL) employs both labeled data and unlabeled data to train the classification hyperplane so as to improve the learning performance. Owing to its good performance, SSL has attracted much attention and many algorithms have been developed. By exploiting potential information of both labeled data and unlabeled data, it has been proved that the SSL can significantly improve the learning performance. ..Recently, many SSL methods have been proposed. However, most of recent studies are based on centralized processing. In many practical networked systems, because of the high communication cost, limited processing capacity, or the concern of privacy and economic profit, we cannot perform centralized processing. Distributed processing is an efficient manner to deal with in-network data processing. In distributed processing, each node shares some computation, and exchanges information with its nearest neighbors through a certain cooperative manner. Compared to the centralized processing, distributed processing can fully use the local resource of each node, reduce the data transmission, save the bandwidth and power consumption. However, the study on distributed semi-supervised learning over networked systems has not yet received much attention currently. ..Considering this, in this project, we would like to carry out a systematic study on distributed semi-supervised learning over networked systems, develop some distributed semi-supervised learning algorithms, and solve some practical problems. Moreover, considering the existence of incomplete data in many practical applications, distributed semi-supervised learning with incomplete data is also considered. We expect that by exploiting unlabeled, labeled, and incomplete labeled data distributed over multiple nodes, the performance of learning can be improved, while the transmission of the original data is avoided such that the privacy of data is protected. This study is meaningful not only for developing the theory of distributed machine learning, but also for some practical applications.
半监督学习是机器学习领域的研究热点,研究如何同时利用有类标签和无类标签样本来改进学习的性能。但目前大多数研究都是基于集中式的信息处理方式。然而,在许多实际的网络化系统中,由于通信代价高昂、处理能力有限或出于安全、商业利益等考虑,我们通常无法采用集中式处理。分布式处理是网络化系统进行信息处理的一种有效手段。在分布式处理中,每个节点分担一定的局部计算,并与其邻居节点以一定的方式进行信息交换,协作地实现数据处理。相比集中式处理,分布式处理能够充分利用各个节点的局部资源,减少了数据的传输,节省了带宽和能耗。但迄今为止,国内外在分布式半监督学习方面还鲜见文献报道。鉴于国内外研究现状,本项目拟系统地开展面向网络化系统的分布式半监督学习研究,包括在数据缺失情况下的分布式半监督学习研究,建立一套分布式半监督学习的理论和算法,解决一些实际的工程问题。本项目属于应用基础研究,具有重要的理论意义和实际应用价值。
半监督学习是机器学习领域的研究热点之一,半监督学习通过同时利用有类标签和无类标签样本来改进学习的性能。但目前大多数研究都采用集中式半监督学习方式。然而,在许多实际的网络化系统中,由于通信代价高昂、处理能力有限或出于安全、商业利益等考虑,我们通常无法采用集中式处理。本项目基于半监督学习和分布式信息处理的理论基础,针对网络化系统中仅有少量有类标签和大量无类标签样本的情况,系统地开展了网络化系统的分布式半监督学习研究,深度挖掘了无标签样本信息,提高学习的性能。本项目研究分别针对属性完整、属性缺失、流数据、标签歧义、正例和无标注学习(PU学习)、多标签学习、宽度学习以及在隐私保护下的分布式半监督学习等几部分内容展开研究,在不直接传输原始数据的情况下,克服了分布式实现的难点,建立了一套分布式半监督学习的理论和算法。在分布式网络中,网络中每个节点充分利用本地节点的有标签和无标签数据信息训练局部的学习模型,分担一定的局部计算,并与其邻居节点以一定的方式进行信息交换,协作地实现数据处理任务。本项目通过理论分析和大量的仿真实验,验证了提出算法的有效性。此外,考虑到实际场景中的数据隐私问题,本项目还开展了隐私保护下的分布式半监督学习研究,在保护数据隐私的同时,有效地实现了多节点的联合训练。本项目的开发有助于我们解决一些实际的工程问题,为物联网、云计算等应用提供技术支持。项目依照研究计划执行,各项研究内容都已完成,并且取得了较好的研究成果。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
主控因素对异型头弹丸半侵彻金属靶深度的影响特性研究
基于多模态信息特征融合的犯罪预测算法研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
面向文本分类的迁移学习和半监督学习方法研究
数据流半监督分类中的半监督迁移学习研究
面向图像识别的半监督距离度量学习方法研究
面向高维大数据的半监督学习关键问题研究