The processing of high dimensional and redundant data is growing ubiquitous with fast development of information technology. To deal with the "curse of dimensionality", a major approach is to replace the original data by a set of discriminant features namely feature selection or feature extraction, which is getting popular in pattern recognition and machine learning societies. Shannon mutual information measure provides an information theoretic means to measure the relevance among variables. Shannon mutual information is expected to be adopted for relevant measurement since this information measure contains high order statistics. However, it is seldom adopted in high dimensional application because of the estimation difficulty of joint probability density function in high dimensional case. A novel information measure based on the cumulative residual density function (CRDF), instead of the probability density function (PDF), is proposed in this project. Based on this proposed information measure, a feature selection method and a feature extraction method are to be subsequently studied. The CRDF based information measure contains high order statistics while it does not depend on the estimation of PDF. Consequently, it is suitable to extract and select features in high dimensional space where its application model, algorithms and performance will be addressed by this project. Because of the increasing interests of dealing with high dimensional and redundant data in many applications, the results obtained from this project are expected to promote the progresses in those applications including machine leaning, information theoretic analysis, and high dimensional signal processing.
高维高冗余的数据处理问题在现实应用中广泛存在。通过特征选择或特征提取,选择出有辨别力的特征子集,是处理此类问题的重要方法,也是模式识别和机器学习等领域的研究热点。信息论中香农互信息包含了变量间的高阶描述,常被用做相关性的度量,但在高维处理中因联合概率密度估计非常困难,甚至无法得到,难以应用于高维数据。本项目将研究一种基于累积残余分布函数的信息度量方法,以及基于累积残余信息度量的特征选择与提取方法。该方法既可以提供比互相关函数等二阶统计量更丰富的高阶描述,又避免了香农互信息在高维信号处理中难以估计的问题。课题将针对基于累积残余信息的高维高冗余信号特征选择与提取方法的模型和实现算法等问题进行初步研究。课题的研究成果对于信息论理论、高维高冗余信号处理方法以及机器学习方法等的研究,具有较广的理论意义和实用价值。
本项目(基于累积残余信息的高维高冗余信号特征选择与提取方法研究,61673059)为一年期研究项目(原申请为4年期项目),研究用于特征选择和特征提取的累积残余互信息度量指标函数,以及初步探索基于累积残余互信息的特征选择的方法和算法。.本项目的研究目标是基于累积残余分布函数,给出一种不依赖于概率密度估计、可以应用于特征选取的描述标签—多个变量的互信息度量,初步给出采用这个信息度量的特征选择与提取的算法。本项目按照预期计划,已经完成研究目标的主要内容。我们具体研究了最大互熵函数,最小误差熵函数在机器学习和信号处理中的应用,研究了样本统计量稳定性在模式识别中的应用,重点针对信息度量在机器学习和信号处理等的应用方面,取得了多项研究成果。在一年研究周期中,已经发表了3篇SCI检索论文,4篇国际会议论文,并有1篇论文获得国际会议最佳论文奖。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
涡度相关技术及其在陆地生态系统通量研究中的应用
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
面向高维混合数据的信息粒化与特征选择研究
基于特征聚类的高维混合属性数据特征选择方法
基于流形学习的高光谱遥感图像空间-光谱多特征提取与选择
面向多源高维数据流的在线特征选择与分类方法研究