Complex data stream is massive, high-dimensional, dynamic and uncertainty. To deal with these characteristics that are hotspots and difficulties in the field of data stream classification, the classification process must to have higher time-space efficiency, more effective classification performance. Most of the existing algorithms cannot solve two or more complex uncertain problems, and do not take into account the actual application requirements. Firstly, the project focuses on its research content on concept shift, data reduction and rapid search mechanism. Secondly, based on analyzing the essence between concept drift and data reduction and their mutual learning relationship, the project proposes the definition of boundary reference characteristic, establishes a multi-objective optimization model based upon a trade-off between the mutual learning relationship and boundary reference characteristic. Some corresponding optimization mechanisms are designed based on the above optimization model. Thirdly, the project designs an effective incremental reduction algorithm based on boundary reference characteristics and utilize the fast partitioning strategies in order to guide how to fast reduce data stream. Then design some independent classification algorithms under different data processing mechanism, to achieve fast classification and synchronous resolve anomalies, such as concept drift uncertainty. Finally, on the basis of the simplified improved version of the mainstream parallel model, the project designs some parallel ensemble classification algorithms by adopting some techniques, such as multi-layer index structure, the guidance of global information guidance, etc, to effectively adapt the fast classification needs of high speed data stream. In addition, taking the online education classification problem as the actual application background, the project tries to optimize and improve the above parallel algorithms, to aim at obtaining better classification performance when they be applied in complex environments, such as load imbalance. Through these comprehensive study, the project provides new methods for the fast data stream classification problem that can resolve multiple complex characteristics synchronously.
复杂数据流具有海量高维、动态变化与不确定性等复杂特征,要求分类算法具有更高的时空效率和分类性能,是数据流分类领域的热难点。现有算法大部分未较好解决多个复杂特性,且未考虑实际应用需求。本项目首先将研究问题凝练为概念漂移、数据约简和快速检索机制。其次剖析概念漂移与数据约简的本质和互学习关系,给出边界参考特征的定义,设计权衡互学习和边界参考特征的概念漂移识别多目标优化模型和优化机制;再次围绕边界参考特征设计增量约简算法,并结合划分策略,指导数据流快速约简;接着基于不同数据处理方式设计单分类算法,实现快速分类且同步解决异常、概念漂移等不确定性;最后基于主流并行模型设计不同并行集成算法,发挥多层索引结构、全局信息指导等优势,有效适应高速数据流快速分类需求。此外,结合在线教育应用背景优化和改进算法,提高算法在负载不均衡等复杂环境的应用能力。本项目的研究为同步解决多个复杂特性的快速流分类问题提供新方法。
复杂数据流已成为当前主流数据的呈现方式,兼有海量、动态变化等传统特征和不确定性特征,现有多数算法未能较好解决多个复杂特征的流分类问题。本项目从数据约简、增量学习和划分策略等方面展开探索,研究如何快速解决数据流分类问题。为了实现这个目的,首先从多目标优化算法改进入手,设计了多搜索策略来帮助进化算法交叉算子产生优秀后代提高其收敛性和搜索效率、将头脑风暴算法的思想应用到基于分解的进化算法中提高了分解进化算法的性能、设计了基于双归档集的多目标进化算法用来平衡多样性和收敛性,既提高了优化算法性能和构建了数据约简优化模型,又为参考子集融合评价、概念漂移度量提供了借鉴参考;其次设计了基于近邻分类边界的原型增量选择算法,克服了传统流分类算法读取序列、异常节点等不足;接着设计了基于树结构划分策略和约简技术的增量近邻分类算法,提高了分类效率、降低噪音及不均衡样本的影响;最后设计了简化的Hypervolume进化算法,改进了外部种群选择算子和更新策略,降低了计算复杂度,为探索并行式混合两级指导的流分类机制提供了思路。通过本项目的研究,为复杂数据流快速分类算法研究和应用提供了新思路和方法。发表SCI论文5篇。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
F_q上一类周期为2p~2的四元广义分圆序列的线性复杂度
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
遗忘型轻度认知损伤与老年性痴呆内隐与外显记忆的神经环路机制- - 结合任务与静息的功能磁共振成像研究
基于多任务学习的复杂概念漂移数据流分类研究
基于学习的复杂并行绘制系统负载平衡算法研究
面向大规模数据挖掘的隐私保护支持向量机增量与并行学习算法研究
矩阵计算问题的快速算法和并行算法研究