With the development of high-performance computing and big data storage technology, ultra-high dimensional data analysis has attracted tremendous interests from both researchers and practitioners, because of its appearance in many real applications, including social and economic sciences. Variable selection aims to correctly identify the truly informative variables in the ultra-high dimensional data, overcome the difficulties encountered in classical statistical methods, and significantly improve the estimation and prediction accuracy. Thus, the objective of this research proposal is to provide a novel variable selection method by taking full use of the nice properties in reproducing kernel Hilbert space (RKHS), such as derivative reproducing property, representer theorem, as well as the kernel ridge regression in RKHS. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic theoretical results. Tighter theoretical results are provided for squared loss function with some extra operators in functional analysis, and a linear case is studied as a special case of our method to provide some better understanding both methodologically and theoretically. Furthermore, we extend the proposed method to interaction selection, which has attracted tremendous interest in recent years. The applicant’s solid theoretical foundation,rich research experience and preliminary exploratory research will lead the project into the right direction for its successful completion, which will make some substantial contributions to the research on variable selection, and provide a novel idea for efficiently solving the ultra-high dimensional data.
随着高性能计算与海量数据存储技术的发展,超高维数据越来越多的出现在社会生活以及科学研究等诸多领域,引起了研究者的广泛关注。通过变量选择,可以抓取超高维数据中对统计分析真实有用的变量,克服经典统计学方法所遇到的困难,并且可以显著地提高统计估计与预测的精确性,为更深入的统计分析奠定了基础。本项目拟利用再生核希尔伯特空间中函数所特有的性质,如导数的可再生性等,基于再生核希尔伯特空间的岭回归工具,提出一类具有良好数据与模型适应性、计算高效迅速以及有理论保证的变量选择方法;以平方损失函数为例,借助泛函分析中的算子工具给出更精确的理论结果,并以线性模型等为特例深入研究其性质;将该类方法扩展到目前热门的变量交互效应的选择中。申请者扎实的理论基础,丰富的研究积累以及前期较多的探索性工作,为本项目的顺利完成奠定坚实的基础,其最终研究成果将进一步丰富变量选择方法,为有效处理超高维数据提供一个新颖的思路。
随着高性能计算与海量数据存储技术的发展,超高维数据越来越多的出现在社会生活以及 科学研究等诸多领域,引起了研究者的广泛关注。本项目拟利用再生核希尔伯特空间中函数所特 有的性质,如导数的可再生性等,基于再生核希尔伯特空间的岭回归工具,提出一类具有良好 数据与模型适应性、计算高效迅速以及有理论保证的变量选择方法;以平方损失函数为例,借 助泛函分析中的算子工具给出更精确的理论结果,并以线性模型等为特例深入研究其性质;将 该类方法扩展到目前热门的变量交互效应的选择,网络数据分析以及有向无环图的还原估计中。申请者扎实的理论基础,丰富的研究积累 以及前期较多的探索性工作,为本项目的顺利完成奠定坚实的基础,其最终研究成果将进一步丰富相关领域的研究。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
监管的非对称性、盈余管理模式选择与证监会执法效率?
特斯拉涡轮机运行性能研究综述
基于LASSO-SVMR模型城市生活需水量的预测
变量核奇异积分算子及其相关问题
解析再生核希尔伯特空间的自适应傅里叶分解理论及其相关应用
再生核希尔伯特空间图像稀疏表达算法研究
再生核希尔伯特空间中自适应滤波新方法及应用